Sete 

ee 

Sie here 
acy tah 


an 
eee) 


4. 
saa 
ky 


Soe 4d S350 
Cea 
ee 


ea etre Me 
ba be eet Oe) S88 
GI See ae Ie 
Se a 1 i ¢ 
ra cae We reera hte 
ete st See 
eke meets ee 
ee SERS I AS 
bo 8 38, Be 36 3 
Ke 
bata te? 
» 
SENN era 
= ba 
ie caeeeSeS) + 
Chor aes NS 
rye 
Sires 


Peal, HE We Sie 
aS s ak, 
ei Ot ie he aaa Oh 
BALE Se ara s 


SSS OF re 
Wig ad eth Nee, 
i aha fe i; sete 


>, 
i 


shy 


LV 
15 
i>, 
OLE 


" 
) 
ye 


Digitized by the Internet Archive 
in 2022 with funding from 
Kahle/Austin Foundation 


https://archive.org/details/introductiontoma000Ojuli 


AN INTRODUCTION TO 
MATHEMATICAL 
PROBABILITY 


AN INTRODUCTION TO 
MATHEMATICAL 


«PROBABILITY 


BY 


JULIAN LOWELL COOLIDGE 


PROFESSOR OF MATHEMATICS IN HARVARD UNIVERSITY 


OXFORD UNIVERSITY PRESS 
LONDON: HUMPHREY MILFORD 


OXFORD UNIVERSITY PRESS 
AMEN HOUSE, £.0. 4 


LONDON EDINBURGH GLASGOW NEW YORK 
TORONTO MELBOURNE CAPETOWN BOMBAY 
CALCUTTA MADRAS 


HUMPHREY MILFORD 
PUBLISHER TO THE UNIVERSITY 


First edition 1925 


Reprinted photographically in Great Britain in 1942 
by Lown & BRYDONE, PRINTERS, LONDON, from 
corrected sheets of the first edition 


“ PREFACE 


THE present work is based upon the lectures which I have 
delivered, usually in alternate years, at Harvard University. 
It is not intended primarily as a contribution to mathematical 
science, but as a text-book introductory to a branch of 
mathematics which has assumed an unexpected importance 
in recent times. 

There are plenty of good books dealing with the theory of 
mathematical probability. In French we have the beautiful 
treatises of Bertrand * and Poincaré {—the former reads like 
a romance, the latter has much of the originality and brilliance 
characteristic of the master—as well as the text of Borel f 
on the same high level, to say nothing of others of less 
note. In German there is, first of all, the encyclopaedic but 
readable text of Czuber,$ the translation of Markhoff,|| with 
its unusual attention to rigour, as well as several others. 
In Italian there is the recent work of Castelnuovo,{ careful, 
critical, and judicious. How is it in English? There is 
only one recent text-book,** that of Fisher, very full in its 
treatment of statistics and frequency curves, but omitting 
many of the most important parts of the subject. The striking 
book on probability by Keynes tt is purely philosophical in 

* Calcul des probabilités, Paris, 1889. 

+ Calcul des prodabilités, 2nd ed., Paris, 1912. 

+ Lléments de la théorie des probabilités, Paris, 1908. 

§ Wahrscheinlichkeitsrechnung, 2nd ed., Leipzig, 1908. 
|| Wahrscheinlichkeitsrechnung, Leipzig, 1912. 

4 Calcolo delle Probabilita, Rome, 1919. 


** Mathematical Theory of Probabilities, 2nd ed., New York, 1922. 
++ Treatise on Probability, London, 1921. 


vi PREFACE 


interest, inclining to the thesis that probability is not a 
mathematical subject anyway. 

It would, of course, be far better if every Rania -speaking 
reader were a sufficient master of foreign languages to study 
all of these excellent texts, but such is manifestly not the 
case. The simple fact is that such readers absolutely will 
not make the linguistic effort necessary. The need for a brief 
but comprehensive English text is obvious, if regrettable. 

From the purely mathematical point of view, the calculus 
of probabilities is somewhat unsatisfactory. To begin with, 
we are forced to use approximate formulae, and it is not 
always easy to have an exact knowledge of their degree 
of exactness, at least without arduous calculations. Then 
certain fundamental laws, like the Gaussian Law of Error, 
are based on a variety of so-called proofs, each making some 
very broad assumptions of doubtful validity. And lastly, 
there is a nasty habit of developing a formula under the 
assumption that it holds for a very limited range, and then 
calculating the constants by computing out to infinity. For 
this reason the mathematician is tempted at times to view 
the whole subject with distrust. This is a mistake. How- 
ever the formulae may be derived, they frequently prove 
remarkably trustworthy in practice. The proper attitude 
is not to reject laws of doubtful origin, but to scrutinize 
them with care, with a view to reaching the true principles 
underneath. It seems to me that, in the last analysis, 
probability is a statistical, that is to say, an experimental 
science, and the mathematical problem is to establish rules 
which yield correct and valuable results. 

Perhaps the most characteristic feature of the present work 
is that the statistical definition of probability is adhered to 
throughout. This has been done in philosophical discussions, 
and Castelnuovo comes very near to adopting it, but the 
usual method is to have several different definitions of 
probability, and reconcile them tant bien que mal. 


PREFACE vii 


As a matter of history, the calculus of probability started 
with the study of games of chance. The present book does 
the same. Of course, this branch of the subject is not the 
most important to-day, but in studying any science it is wise 
to pay some attention to the problems that gave it birth. 
Moreover, from a didactic point of view, it is. doubtful 
whether the plan of replacing problems in games of chance 
by problems in life insurance is likely to increase the interest 
of the beginner. On the other hand, the tendency which 
some people show of attempting to solve all problems in 
probability by assimilating them to drawing balls from an 
urn is fundamentally unsound, as it departs from the facts. 

The subjects of mean value and expectation, which have 
always played a central role in the theory of probabilities, 
have taken on additional importance in recent years, owing 
to the idea of dispersion, and its application to statistical 
series. For that reason they have been given a good deal of 
prominence. Per contra, geometrical probability, which is 
little more than a plaything, and the probability of causes, 
which rests on very shaky foundations, are treated briefly. 
Yet they should not be omitted entirely, for the former is 
related to statistical mechanics, and the latter gives the only 
answers we have to certain questions which recur insistently. 

The most important part of the theory is that which deals 
with the distribution of errors of observation. The funda- 
mental question here is what to do with the exponential law 
of Gauss. I have tried to make it as plausible as I could by 
basing it on very broad assumptions, even though this adds 
somewhat to the length of the deduction. I have, however, | 
given the principles of combining observations as far as 
possible independently of the Gaussian law. The study of 
errors in two dimensions, which formerly interested few but 
students of artillery practice, has taken on a new importance 
through its relation to statistical correlation. 

The treatment of least square and indirect observations 


Vili PREFACE 


follows traditional lines. In studying the application of least 
squares to curve fitting, I have briefly explained the modern 
method of moments. I have also included a summary treat- 
ment of the applications of probability'to such widely diver- 
gent topics as the kinetic theory of gases and life insurance. 

In general, it has been my idea to give the mathematical 
basis underlying each of the important applications of proba- 
bility rather than to write a treatise on games of chance, or 
errors of observation, or the combination of measurements, 
or statistics, or statistical mechanics, or insurance. 

With a view to adding to the didactic value of the work 
I have introduced a certain number of exercises for the 
student. It would be easy to multiply these indefinitely. 
The few which I have chosen seemed to me particularly 
interesting, but that, perhaps, is a matter of individual 
preference. The teacher or student will find little difficulty 
in adding to the number. 

Paragraphs marked { are more difficult than the others, 
and may well be omitted by the beginner. 

There is little need for elaborate bibliographical notes as 
Czuber’s comprehensive report,* though not entirely free from 
mistakes, covers the ground thoroughly. 

J.L.C. 


CampBrincE, U.S.A., 
December 1924. 


* Die Entwickelung der Wahrscheinlichkeitsrechning und threr Anwendungen. 
Jahresbericht der deutschen Mathematikervereinigung, Vol. vii, Part 2, Leipzig, 1899. 


CONTENTS 


CHAPTER I 


THE SCOPE AND MEANING OF MATHEMATICAL PROBABILITY 
PAGE 

The uncritical view of probability as a measure of expectation 

The necessity for objective definition 

Definition based on strength of causes 

The usual mathematical definition ; critique of the same 

First statistical assumption and definition ; detailed discussion 

Second empirical assumption, meaning and limitations . 

Discussion of the formula ‘ equally likely’ : 

Third assumption when the number of possibilities i is definite eel 


CHAPTER II 
ELEMENTARY PRINCIPLES OF PROBABILITY 
§ 1. Formulae for Combinations and Arrangements 


CONORRAUNE 


Permutation formulae. : 5 3 ale) 
Combination formulae, division into groups : : 14 
§ 2. Simple Problems in Total and Compound Probability 
Elementary cases. : : : : a 1K 
Theorem of total probability, special case ; : , : nally 
Theorem of compound probability . c ; 0 2 : as 
Tchebycheff’s problem. : : : : eal 
Theorem of total probability, general case : j : ; . 24 
§ 3. Expectation 
Definition of expectation, fair turns : 5 3 ; : - 25 
Petrograd Paradox . ; , : - ; : : : ae 
§ 4. Risk 
Definition of risk, simple applications. : ‘ F 5 . 30 


CHAPTER III 
BERNOULLI’S THEOREM 
§ 1. The Problem of Repeated Trials 


Fundamental formulae for repeated trials : ; ; : . 32 
Maximum probability, discrepancy . ‘ : : ; : . 34 
Bernoulli’s theorem . : é : : ; : F 2 =» itl 
§ 2. Stirling’s Formula 
Meaning of an asymptotic approximation : é : a te! 
Empirical determination of factors for a factorial . : . 39 
Stirling’s formula. ‘ : ; : » 41 
§ 3. The Piobabiliey ee 
Probability for a discrepancy within given limits . : ; . 43 
Passage to integral form . : : , : . 44 
Calculation of average values by the integral : : : ‘ in EE 


Problems in discrepancy . : : : é : ‘ 5 el) 


x CONTENTS 


§ 4. Games of Chance PAGE 
Definition of problem, case of fair game . 3 5 Be 
De Moivre’s device for handling an : unfavourable game’ : » oe 
Game of roulette . : : ‘ : ; : a le 
Chance of ruin at a particular tum . : : : : : 5 le) 


CHAPTER IV 
MEAN VALUE AND DISPERSION 
§ 1. Elementary Theorems in Mean Value 


Definition of mean value, relation to average . . : ‘ + 360 
Mean values of sums, products and squares. 5 : oll 
Tchebycheft’s inequality A : : : : : : : . 64 
Poisson’s law of large numbers . . : : 5 : ea 

§ 2. Dispersion 
Definition of dispersion, fundamental dispersion theorem : Ue 
Application to groups of observations : : , 2 5 ae 
Bernoulli, Lexis, and Poisson series . . : : oS 
Cases of identical mean value, but varying dispersion : : a tl 

CHAPTER V 

GEOMETRICAL PROBABILITY 

Difficulty in the choice of independent variables . : ; . 74 
Paradoxes . : ee: 
Poincaré’s example where the choice of variable is immaterial ee. 
Simple examples where the choice is obvious . : : $ eas 
Buffon’s needle problem . : : ‘ , ; : : spe eh 
Problems in crossing contours . ; : : : ‘ : OS 

CHAPTER VI 

PROBABILITY OF CAUSES 

Preliminary definitions . : ; : : : : : eo 
Bayes’ formula . ‘ : : : : : ; A tole: 
Probable composition of an urn : : : : : oO 
Converse of Bernoulli's theorem : : : : : : we 
Bayes’ formula for future events . : : : : : Hy 
Remarks . 4 : ; ‘ : : : : ; ‘ aN 


CHAPTER VII 
ERRORS OF OBSERVATION 


§ 1. Determination of the ‘Best Value’ 
Distinction between the two sorts of errors in paysite measure- 


ments. : é ; ¢ 5 Akay 
First assumptions as to accidental errors . : 3 : : . 103 
Definitions of best value. : : : ‘ . 104 
Determination of this as weighted mean, i : : : LOS 
Mean, probable and average error, residual errors. 107 
Determination of mean error of weighted mean, and of ‘individual 

observation . é : : 5 : ‘ Q : SLO, 
Principle of least squares . : é : = nt 


Median value . : ‘ 2 ; : i : : F «1g 


CONTENTS x1 
§ 2. The Law of Error PAGE 
Three assumptions as to a law of error . 5 ails: 
Deduction of the law 115 
Law of error for a linear combination of known observations . 119 

Formulae for precisions, mean, probable, and average errors. 
Example ee RC Bec we Renee 121 

§ 3. Doubtful Observations 
Various proposed criteria for rejecting observations 125 
CHAPTER VIII 
ERRORS IN MANY VARIABLES 
§ 1. Law of Error 
Definitions of best value 130 
Determination of the law of error for any number of variables 132 
Determination of constants, case of two variables . 135 
Determination of constants, general case 138 
§ 2. The Error Ellipse 
Distribution of errors among ellipses 142 
Determination of axes and angle 143 
Examination of a particular case : - 144 
§ 3. The Correlation Coefficient 

Correlation of two characteristics . 145 
Correlation coefficient, its relation to probability ellipses 146 
Statistical derivation of the correlation coefficient . 147 
Correlation ratio é : 149 


CHAPTER IX 


INDIRECT OBSERVATIONS 
§ 1. The Least Square Method for Combining Indirect 


Observations 
The general problem of indirect observations 
Fundamental assumptions s ; , . 


Derivation of normal equations, general case . 
Normal equations when functions are linear 
Example . 

General form of solution for normal equations 
Precisions of the quantities calculated 


§ 2. Conditioned Observations 
General problem of conditioned observations . 
Examples . : : : ; 


§ 3. Curve Fitting 


Least square eee to a given function by means of a oly 
nomial : : - é : : ; : 

Example 

Generalization, Fourier’s series 

The method of moments 


151 
152 
153 
154 
155 
157 
159 


162 
163 


164 
165 
166 
167 


Xl CONTENTS 


CHAPTER X 
THE STATISTICAL THEORY OF GASES 
§ 1. General Properties of Perfect Gases PAGE 
Assumptions as to perfect gases, rebound of molecules . ‘ 5 hla 
§ 2. Representation in Hyperspace 
Definition of hyperspace . : ‘ ‘ : : : c 2 ts: 
Gas model in 6m dimensions . ‘ : : ¢ = ede 
Collision equations, invariance of volume. : ; : . 174 
First probability assumption. : . 175 


§ 3. First Deduction of Maxwell’s Law 
Application of local probability, search for the most , distribu- 


tion : 176 

Use of Stirling’s theorem, “and conservation of energy : : oo LH 

Determination of constants, final statement of the law . 3 oe Ayes: 

Maxwell’s original deduction . : : - 179 
§ 4. Amplification of the ane Proof 

Investigation of the applicability of Stirling’s formula . : lsd 

Introduction of small variations of velocity coordinates . : . 181 


§ 5. Probability of a Nearly Normal State 


Evaluation of the probability that the distribution is nearly normal 182 
Statistical statement of the law of distribution of velocities . - 186 


§ 6. Distribution in Space 
Formula for the distribution of molecules in space . ¢ : Pe isye 
CHAPTER XI 
THE PRINCIPLES OF LIFE INSURANCE 
§ 1. The Calculation of Life Probabilities 


Difficulties in the Pree of the Peg an survival : = RSE 

Statistical methods . ; 5 cin 

Makeham’s formula . : : , ; : ‘ ; : eetg2 
§ 2. Endowments and Annuities 

Compound interest and discount. : : : : 2 ldo 

Endowments, annuities, and annuities due : ; : : - ARS 

Expectation of life . : ; : P ‘ : : : Say 
§ 3. Single Payment Insurance 

Single payment simple life policy . : . 198 

Increasing insurance, temporary and endowment i insurance . 5 IRE 

Payment ‘before the end of the year : : . 5 5 3200 

§ 4. Premiums 

Annual premiums on simple ty bee of insurance é : ; . 201 

Return of premiums . : : : : : : . 202 

Loading . ; é : ‘ : : : : ‘ : . 204 

§ 5. Surrender Values 
Meaning of surrender value, different methods of calculation . - 205 
Ownership of the reserve . : : : : : : : 206 


TABLES. SuBsECT-INDEX. INDEX OF AUTHORS . P 5 208- 16 


CHAPTER I 


THE SCOPE AND MEANING OF MATHEMATICAL 
PROBABILITY 


EvERyYsopy has a pretty good working knowledge of the 
meaning of the words ‘ probable’ and ‘improbable’. If a man 
be asked: ‘Is it probable that the sun will rise to-morrow ?’ 
‘Is it probable that you will be elected next Grand Lama of 
Tibet?’ he knows precisely what the question means, and 
is able to answer without hesitation. That is because the 
terms are used in a general sense, without any attempt at the 
refinement of accuracy needful for mathematical purposes. 
In exactly the same way, everybody understands the state- 
ment that Cap Gris-Nez is the nearest point in France to 
Great Britain. The trouble comes when we undertake to say 
what we mean by a mathematical point, and in like manner 
we encounter serious difficulty when we try to express 
probability in exact mathematical language. A brilliant con- 
temporary philosopher has defined mathematics as the science 
where we never know what we are talking about, or what 
our results mean; the calculus of probability is no exception 
to this pessimistic definition. 

How shall probability be defined as a mathematical term ? 
The first definition which we have to consider, and which is 
ascribed to James, alias Jacob, Bernoulli, is that probability 
is the measure of the strength of our expectation of a future 
event. If we feel almost sure that an event is going to 
happen, we say that it is highly probable; in the contrary 
case, we call it highly improbable; and if we are so inclined, 
we may express our expectation in the form of a bet, for or 
against the arrival of the event. Probability appears as the 
mathematical measure of our state of expectancy. 

2686 B 


2 MATHEMATICAL PROBABILITY 


The mere statement of such a definition is sufficient to 
raise a host of objections. If, as the statement suggests, 
probability is merely a sort of psychological ‘coefficient, the 
practical application of it should be in the psychological 
laboratory, where it could be measured in the same way as 
reaction time, intensity of response to stimulus, persistence of 
illusion, But no large circle of persons is interested in any 
such sort of probability as that. Moreover, different persons 
will assign different degrees of probability to the same future 
event, and the same person will feel differently about it at 
different times, according to his mood, or the state of his 
digestion. In consequence of this, the supporters of such 
a definition putin a qualifying adjective, saying that prob- 
ability is the meusure of our ‘intelligent’ expectation of 
a future event. This raises the question, ‘ What is intelligent 
expectation?’ The answer would seem to be that it is the 
expectation of an intelligent person, reasoning on the facts in 
the case, and not on his own personal hopes and fears. But 
if all intelligent persons will reach the same degree of 
expectancy when they reason intelligently on the facts, then 
the measure of this expectancy must be a function of the 
facts themselves, and not of the individual; who may thus 
be left out of account. 

A lineal successor of Bernoulli has appeared in very recent 
times in Keynes, whose remarkable book was referred to in the 
preface. This writer’s main thesis is that probability is not 
concerned with events, but with judgements or propositions.* 
This is a question of definition, and his point of view is 
certainly legitimate. But a science of the probability of 
judgements can scarcely be made a subject of exact mathe- 
matical treatment, and this also is one of Keynes’s principal 
contentions. His method is, to use his own words: t 
‘To regard subjective probability as fundamental, and to 
treat all other relative conceptions as derivative from this.’ 
It is perhaps open to question whether he has entirely answered 
the difficulties raised against Bernoulli, but in any case 


* Loe. cit., p. 5. + Ibid., p. 34. 
+ Ibid., p. 282. 


SCOPE AND MEANING 3 


it is perfectly evident that a line of reasoning which starts 
from the premiss that a certain subject is non-mathematical, 
is not a good introduction to a mathematical treatment of 
that subject. 

A second definition, which has the great authority of John 
Stuart Mill, may be expressed as follows:* Let us suppose 
that an event depends upon a certain nexus of causes, each of 
measurable intensity. We measure the total field or extent 
of variation of these causes, and take this as denominator, 
while as numerator we take the measure of that field or 
extent of variation which will produce what we call a 
favourable outcome. The fraction is defined as the probability 
for a favourable result. For instance, if a coin be spun in the 
air, there is an equal chance that it will turn up head or tail, 
because the initial angular velocities of spin and translational 
velocities of the centre of gravity which will cause the coin to 
show a head, cover one-half the total ranges of angular 
and translational velocities which need be considered in the 
particular problem. 

The difficulties attending such a definition are so many that 
it is scarcely worth consideration. What antecedent events 
are to be classed as causes? Are there not different groups of 
independent events, each of which might be called the group 
of causes, and how do we know that the result will be 
independent of the choice of group? Moreover, it is by no 
means certain that the calculation will yield in every case 
the number which the uninitiated would designate as the 
probability for a favourable outcome. Suppose that in a 
certain congressional district there were tive rock-ribbed 
Republicans to each three stalwart Democrats. The forces 
tending to elect a Republican congressman would seem to 
bear a ratio of five to three to those tending to elect a 
Democrat, but we should hesitate to say that there were 
only tive chances in eight that the successful candidate would 
be a Republican. 


* Logic, 8th ed., vol. ii, p. 72, London, 1872. Mill does not require all prob- 
abilities to come under this head, but insists that we have here the real 
reason for the persistence of statistical ratios. 


B 2 


4 MATHEMATICAL PROBABILITY 


We now come to the definition of probability which is 
used in practically all mathematical treatises on the subject, 
and which can be put into the following words: ‘An event 
can happen in.a certain number of ways, which are all equally 
likely. A certain proportion of these are classed as favowr- 
able. The ratio of the number of favourable ways to the total 
number is called the probability that the event will turn out 
favourably.’ 

This definition is ever so much more clear-cut than the 
others, and is capable of immediate application in many cases. 
Nevertheless there are two objections which may be easily 
raised. The first is, ‘Exactly what is meant by the phrase 
“equally likely”?’ This question is of fundamental im- 
portance in our subject, and must be treated in detail later. 
The other criticism, on the strength of which we reject this 
definition, is that as it stands it excludes many probabilities 
which are, nevertheless, among the most important. As stated 
above, it is inapplicable to a large number of cases where the 
number of ways is infinite; also it excludés the whole field 
of statistical probability. If we ask the probability that 
a letter in the lost-letter office contain money, we are asking 
a concrete and intelligible question, but to base the answer 
on the number of ways in which money can be put into 
or left out of a misdirected letter would be the height of 
absurdity. It is the desire to include all forms of probability 
under one definition that leads us to the form which we shall 
now explain. 


First empirical assumption. 

Ir AN EVENT WHICH CAN HAPPEN IN TWO DIFFERENT WAYS 
BE REPEATED A GREAT NUMBER OF TIMES UNDER THE SAME 
ESSENTIAL CONDITIONS, THE RATIO OF THE NUMBER OF TIMES 
THAT IT HAPPENS IN ONE WAY, TO THE TOTAL NUMBER OF 
TRIALS, WILL APPROACH A DEFINITE LIMIT, AS THE LATTER 
NUMBER INCREASES INDEFINITELY. 


Definition. 
The limit described in the first empirical assumption shall 


SCOPE AND MEANING 5 


be called the probability that the event shall happen in the 
first way, under those conditions.* 

There are quite as many ‘objections to this method of 
defining probability as to any other, and we must set about 
defending it as best we can. To begin with, how do we know 
that the first empirical assumption is true? The only answer 
is that experiencé in many fields under all sorts of circum- 
stances has demonstrated its truth. Few laws of nature are 
so well established as this, and we are not only justified, but 
compelled, when it appears to be at fault, to examine whether 
the conditions of experiment have not undergone unnoticed 
but important alterations. For instance, the forms of inad- 
vertence which cause people to leave their umbrellas in public 
conveyances are many and various, but the proportion of 
umbrellas to the total number of travellers in any particular 
locality is apparently fairly constant. An increase of marked 
amount would suggest the query as to whether the weather 
had not been unusually bad; a notable diminution would 
suggest either good weather or dishonest employees. 

The most serious difficulty with the definition is the 
following: What is meant by ‘the same essential conditions ’? 
No two experiments are ever performed under identical 
conditions. There are always slight changes in temperature, 
barometric pressure, the state of the experimenter’s digestion, 
the chemical composition of his blood. How can we tell 
what conditions are essential, and what ones are not? The 
objection is perfectly valid, and does not admit of any perfect 
refutation. We must recognize, however, that it does not 


* This is essentially the definition used, for the most part, in Mill's Logic, 
and defended in great detail in Venn’s Logic of Chance, 3rd ed., London, 1888. 
Neither of these writers, however, draws a sufficiently sharp distinction 
between a ratio and the limit of a ratio. A much more accurate stutement 
will be found in an article by Von Mises, ‘Grundlagen der Wahrscheinlich- 
keitsrechnung’, Mathematische Zeitschrift, vol. v, pp. 58 ff. Keynes, loc. cit., 
part I, ch. viii, attacks it with the utmost vigour. His objections seem tome 
to fall completely to the ground if one considers that probability has to do 
with events, not with judgements. He refuses for philosophical reasons to 
consider the probability of events, he scarcely will acknowledge the existence 
of such a probability. I am equally sure that the probability of events is the 
only kind worthy of serious mathematical study. 


6 MATHEMATICAL PROBABILITY 


bear on the present question alone, but on the whole of 
experimental science. If we seek to determine any physical 
law by experimental means, we tacitly assume that such 
changes as occur in the conditions axe immaterial to the 
result. “Without some such fundamental postulate all experi- 
mental science would be impossible. Shall the psychologist, 
experimenting on the sensitiveness of a patient to certain 
faint stimuli, give up all hope of learning the truth because 
between one experiment and another the earth will have 
performed a certain number of turns about its axis, and will 
have travelled a certain distance along its orbit, while the 
whole solar system will have made a certain progress through 
space? Evidently the postulate that we can distinguish 
between essential and unessential conditions lies at the basis 
of all inductive science, and cannot be charged up to the 
calculus of probability. 

Another criticism which can be levelled against our definition 
is the following. In making probability merely the limit of 
a statistical ratio, we exclude the possibility of ever determining 
a probability except as the result of a long series of experi- 
ments, and even then we could only determine it approximately. 
There are some writers who are frankly willing to accept this 
limitation,* but our own view is that we need not tie our 
hands quite to this extent. It is hard to base the probability 
that a card drawn at random from a pack should be black 
upon a series of experiments ad hoc, when we do not know 
whether such experiments have really ever been performed. 
We therefore make our 


Second empirical assumption. 

IF AN EVENT CAN HAPPEN IN A CERTAIN NUMBER OF WAYS, 
ALL OF WHICH ARE EQUALLY LIKELY, AND IF A CERTAIN 
NUMBER OF THESE BE CALLED FAVOURABLE, THEN THE RATIO 
OF THE NUMBER OF FAVOURABLE WAYS TO THE TOTAL NUMBER 
IS EQUAL TO THE PROBABILITY THAT THE RVENT WILL TURN 
OUT FAVOURABLY, 


* Venn, loc. cit. Mill, on the other hand, looks upon probabilities deter- 
mined by reasoning as more certain than those determined statistically. 


SCOPE AND MEANING 7 


It cannot be too much emphasized that this is an empirical 
assumption based upon experience, exactly like the other. It 
tells us that the ratio of favourable outcomes to trials approaches, 
as a limit, the ratio of the number of favourable ways to total 
ways. It is sometimes assumed that the one or the other of 
these assumptions can be proved by Bernoulli's theorem, to be 
developed in a later chapter. This is pure illusion. No one’s 
theorem, based on a@ priori considerations, can prove that 
in practice a coin will show heads about one-half the time. 
Moreover, a few moments’ reflection will show that in one 
guise or another we must have both of these assumptions. 
Without the second, we could never predict the probability of 
an outcome from the data; the matter would always have 
to be put to the test. Without the first assumption, there 
would be absolutely no connexion between the ratio of favour- 
able to total ways and the statistical ratio determined by 
practice ; a probability defined by the former would be an 
abstract number, having no practical significance.* 

There is one other very important point in this second 
assumption, which we mentioned above, and which we must 
now examine carefully. What is meant by ‘equally likely’ ? 
If we say that two ways are equally likely when the number 
of arrivals either way bears to the number of trials a ratio 
with the same limit, we are running around in a circle, and 
saying that if the limit of a certain ratio is the probability of 
success, why, then the probability of success is the limit of 
that ratio. No, if our second assumption is to tell us any- 
thing at all, we must mean something else by ‘ equally 
likely’. 

There has been a good deal of debate among philosophers as 
to just what meaning should be attached to these mystic words, 
and two sharply divergent views have been expressed, and 
ably defended. The first of these, which has the great 
authority of Laplace,t and has been vigorously defended 


* This is admirably brought out by Cournot, Théorie des chances, Paris, 
1843, pp. 437 ff. 
} See his Traité analytique des probabilités, Paris, 1812. 


8 MATHEMATICAL PROBABILITY 


by Stumpf* is expressed by saying that two results are 
equally likely when we know that one of them must happen, 
but have no information leading us to expect the one rather 
than the other. Everybody will admit that this expresses 
a necessary condition that two events should be equally likely, 
the doubt is as to its sufficiency. Assuming that Mars is 
inhabited, what is the probability that the inhabitants are 
carnivorous? The most imaginative observer will acknow- 
ledge that, as far as our present information goes, we are com- 
_ pletely in the dark on this interesting point, there is nothing to 
guide our opinion. Shall we, therefore, say that the prob- 
ability that these enterprising engineers are carnivorous 1S 
expressed by the fraction $2? And if we say so, shall we goon 
to the assertion that if future astronomic research revealed to 
us that a large number of the heavenly bodies were inhabited 
we might expect to find carnivorous inhabitants in about one- 
half of them? Such an assumption is the merest juggling 
with words, and we do not hesitate to pronounce against the 
sufficiency of this condition.t It is not unnatural, then, that 
some philosophers have been led to the opposite extreme, and 
have maintained that we can only say that two events are 
equally likely if we are acquainted with all the causes tending 
to produce the one or the other, and know them to be of equal 
potency. We do not say that a spinning coin is equally likely 
to turn up head or tail because we know no reason to expect 
the one rather than the other; we make this affirmation only 
upon the hypothesis that it is a real coin and not a counterfeit, 
nearly homogeneous, with the centre of gravity near the 
middle, while the method of throwing is such that it had no 
tendency to favour the one face at the expense of the other. 
This idea was skilfully elaborated by Von Kries ft in his 
theory of ‘range’, which is essentially Mill’s idea of equal 
field of variation for forces. Two ways in which a thing can 


* Ueber den Begriff der mathemalischen Wahrscheinlichkeit, Sitzungsberichte, Royal 
Bavarian Academy, Philosophical Class, 1892. 

t There is a good discussion of this point in Keynes, loc. cit., part I, ch. iv. 

t Prinzipien der Wahrscheinlichkeiten, Freiburg, 1886. The last chapter of 
this work contains an excellent historical summary of the various theories of 
probability. 


SCOPE AND MEANING 9 


happen may be said to be equally likely when, and only when, 
we know that the fields of variation or the forces tending 
to produce the one or the other have equal content. 

It is certain that we have here the best possible way for 
determining whether two events are equally likely or not, 
when it can be applied ; unfortunately in many cases we have 
no complete inférmation, and are tempted to fall back on 
the other principle, namely, that we have no reason to believe 
the range for one set of causes to be greater than that for the 
other. There is a subsidiary difficulty, which Von Kries 
himself recognized, and which raises fearful havoc in certain 
parts of the theory of probability. Suppose that we measure 
two ranges for a certain variable, and find them equal. We 
next replace that variable by a function of itself, and measure 
the corresponding ranges of this new variable. They may be 
very far from being equal to one another. Consequently two 
eventualities which seemed to be equally likely when stated 
in terms of the first variable, might appear far otherwise in 
terms of the second. For instance, suppose that we know 
that a certain variable lies between 10 and 1,000, the ranges 
10 to 100 and 100 to 1,000 are very different in magnitude, 
and would not seem to produce equally likely cases for any 
event dependent on them. But if we found that the natural 
measurement to make was not the variable itself, but its 
logarithm; if the variable appeared naturally as the anti- 
logarithm of a certain number, then the ranges of 1 to 2 and 
2 to 3 for the logarithm would seem to produce equally likely 
cases, 

In spite of this difficulty, our own preference is strongly 
towards the latter form of definition. Not the least of its 
merits is that it is an objective, not subjective shape, and so 
harmonizes with our general point of view. We shall say, 
then, that the words ‘ equally likely’ cannot be used unless the 
essential conditions governing the result are known, using the 
word essential in the same sense as in Assumption 1. It is 
conceivable that some of these essential conditions might tend 
to favour one outcome, some another. If nothing be known 
about the relative strength of these diverse tendencies, we 


10 MATHEMATICAL PROBABILITY 


cannot go further. But if we can say that the total resultant 
of the essential conditions does not tend to favour one outcome 
rather than the other, then the two may be said to be equally 
likely. . 

Our second empirical assumption enables us to predict 
probabilities in cases where the number of ways in which an 
event can happen is finite. We need some corresponding 
assumption in the case where there are an infinite number 
of possibilities. What we shall need here is some assumption 
as to the probability that a group of variables should take 
valucs within a small neighbourhood of a given group. We 
have lurking in the background the same difficulty we saw 
above in finding two ranges leading to equally likely results ; 
at best we cannot make any very clear-cut hypothesis. 


Third empirical assumption. 

If AN EVENT DEPEND UPON 1 INDEPENDENT VARIABLES 
X, X,... Xn, WHICH CAN VARY CONTINUOUSLY IN AN 1-DIMEN- 
SIONAL CONTINUOUS MANIFOLD, THERE EXISTS SUCH AN ANALYTIC 
FUNCTION F(X, X,... X,) THAT THE PROBABILITY FOR A 
RESULT CORRESPONDING TO A GROUP OF VALUES IN THE IN- 
FINITESIMAL REGION 


A, +4dX,, X,+4dX,,...Xn+34dXy 
DIFFERS BY AN INFINITESIMAL OF HIGHER ORDER FROM 
HE Xs 5 Ag yan oy) kg OA ten Catan 

It must be acknowledged that as long as we are in total 
ignorance as to what function # may be, this assumption 
does not lead us very far. The requirement that it should 
be analytic is unnecessarily strong, but we need a continuous 
function, and we can approach as near to such a function 
as we please by an analytic function. Moreover, the assump- 
tion is not quite so fruitless as one might fear, Let p be 
the probability that the event take a form which we shall 
call favourable, and let this correspond to a region of 
variation &, then 


> eta Pit, aa ee 
K 


SCOPE AND MEANING 11 


If the total field of variation be 7, we have 
1 abel Me Mya Ce 


Now let x, 7,...%, be such a set of independent variables, 
functions of the old ones, that regions R and T' correspond to 
regions 7 and t, and that 


3 (x,, Pen 2m) 
O(A,} Ae,  X,,) 


Then the probability for a set of variables lying in the 


favourable region is - 
le | de ES Ks a 


(1) 
cael | a Ere, 
t 


SVX Oey 


and this is the ratio of the content of the desired manifold r 
to the total manifold ¢, when measured in terms of the 
variables 2, a@,...%,. Or, to put the matter otherwise, the 
probability of a favourable outcome is the ratio of the content 
of the favourable range to that of the total range, when the 
right variables are chosen. In many cases, the mere state- 
ment of the problem leads naturally to the right variables ; 
it is only when there is considerable doubt as to which 
variables these are that the problem is obscure. And of 
course the correctness of any answer depends upon the 
correctness of the choice of variables. We shall explain 
these points in greater detail in a subsequent chapter, that 
dealing with geometrical probability. 

‘tae is one other possible method of defining probability 
which should receive a passing notice. In modern discussions 
of the foundations of mathematics, we do not define points or 
numbers, except in the sense that we make certain independent 
postulates about them. In the same way, we might say that 
the probability of an event was a number which was a function 
of that event, and which obeyed certain formal laws of logic. 
This method, which we should like to see developed, would 
be unexceptionable from the point of view of abstract mathe- 


12 MATHEMATICAL PROBABILITY 


matics, but the real importance of the calculus of probability 
does not lie in any such field as that. We emphasize once 
more that we are dealing with what is somewhat loosely 
called an ‘applied science’, and the fitndamental questions 
do not deal with the abstract philosophical nature of prob- 
ability, which always seems to remain somewhat obscure and 
elusive, but rather the meanings of numerical probabilities in 
specific cases. The purpose of this first chapter has been to 
develop a general definition for that meaning which should 
‘make sense’ in every case. 


CHAPTER II 


“ 


ELEMENTARY PRINCIPLES OF PROBABILITY 


§ 1. Formulae for Combinations and Arrangements. 


IF n be a positive integer, we give the name factorial n to 
the product m-(n—1)-(n—2)---3-2-1, and we have for 
this a special notation, namely 


n(n—1)(n—2)---3+2-1= 7! (1) 
It is convenient to extend this equation to the case where 
‘acme mn! 
ee tea ee 
Ey N 
i 
ol=—= 
i 1 


The student must not forget that this is a definition, it is 
not a statement that if 0 be multiplied by the positive integers 
less than itself, the product is 1. 

Suppose that from 7 distinguishable objects we pick r 
objects, and arrange them in order; in how many ways can 


this be done? , Evidently we have a choice of n objects for. 


the first place, n—1 for the second, &c. The number of 
arrangements is 2(n—1)...(n—7r+1). 

This is sometimes called the number of permutations of n 
things taken 7 at a time, and written D”, but we do not need 
to burden ourselves either with the name or the symbol. j 

A more interesting and important number is that which 
tells us in how many ways r objects can be picked from 7 
objects,'regardless of order.’ If this number be a, and if we 
subsequently multiply by the number of ways that 7 objects 
can be arranged among themselves, the product is the number 
of arrangements of r objects taken from 7 objects, thus 


14 ELEMENTARY PRINCIPLES OF PROBABILITY 


z-r!=n(n—1)---(n—r+ I), 
pot m(m—1)+--(n—7r+ 1) 
r! 

c=n'/rl(n—r)! ; (2) 
It is to be noted here that r and (n—r) appear sym- 
metrically, but that might have been foreseen, for the number 
of ways that we can pick r things to be taken from n things 
is the number of ways that we can pick (n—7r) to be left. 
The total number of ways in which something can be taken is 


r=n 

Daman 
"t(n—r)! | 

pear rT): 


and this may be written, by the aid of the binomial theorem, 
(141)"—1 = 2"—-1., 


This again might easily have been foreseen, for each in- 
dividual object may be taken or left, irrespective of the others, 
but we must exclude the one case where all are left. 

Let us return to formula (2). An easy and important 
extension is found as follows. In how many ways can n 
objects be divided into a group of a@ objects, another of b 
objects, another of ¢ objects, and so on? ‘The first group can 
be chosen in n!/a!(n—a)! ways. The second group can be 
taken from the remainder in (n—«a)!/b!(n—a—b)! ways, and 
soon. Multiplying together, we get 

n! 
atbiect... (°) 


There is one modification of this formula which is easily 
overlooked, Let n= 1s, and let us imagine that we have 
r groups, each of s objects. The formula above will give as 
the number of ways 1!/(s!)", and this answer is usually right. 
But in certain cases we may wish to make no distinction 
between the first group, the second group, &c., so that to get 
the answer, we should divide this by the number of ways ia 
which the r groups might be arranged in an order of prefer- 
ence, thus getting »!/(s!)"r!. 

As an example, we see that the number of ways that four 


COMBINATIONS AND ARRANGEMENTS 15 


hands can be dealt in such a game as whist or bridge is 
521/(13!)*, for the situation of a hand with regard to the 
dealer is important. But if we ask in how many ways can 
52 cards be divided into four indistinguishable piles, the 
correct answer is 52!/(13!)*4!, 

This number is a good deal smaller than the other, but is 
by no means a small number for all that. 

It is time to illustrate these principles with some examples. 


Example 1] In a certain company there are 15 men and 
10 women; in how many ways can a committee be 
picked including 3 men and 2 women? 

The answer is clearly the product of the numbers of ways 
in which the representatives from the two sexes can be 
chosen, namely 


ee a0) _ 15x14x13 ) 10x9 
1213!" 612! 2 i x2x3 2x1 


= 20,475. 


Example 2] 3 travellers arrive at u town where there are 
5 inns; in how many different ways can they be 
lodged ? 

The natural way is to treat each traveller as an independent 
unit, capable of making 5 choices, thus getting 5° = 125. But 
if we know further that the travellers have quarrelled on 
the road, so that no two will lodge at the same inn, the choice 


is reduced to 5x4x3 = 60. 


Example 3] In how many ways can all the letters of the word 
Mississipi be arranged ? 

If all of the letters were distinguishable, the number would 
be, clearly, 11!, but we must divide this by the number of 
ways in which the indistinguishable 2’s can be arranged, the 
indistinguishable s’s, and the p’s, getting 


11!/(41)?2! = 34,650. 


16 ELEMENTARY PRINCIPLES OF PROBABILITY 


§ 2. Simple Problems in Total and Compound 
Probability. 


Example 4] 6 cards are chosen at random from a pack of 
52; what is the probability that 3 will be black and 
3 red ? 

The words ‘at random’ here signify that we consider all 
combinations of 6 equally likely in the sense explained in 
the last chapter. We have thus, by our first two empirical 
axioms, merely to find the ratio of favourable ways to total 
ways, namely 


(261) Wp ewe eee 
levee ae 


Example 5] A card is chosen at random from each of 
6 packs ; what is the probability that 3 cards will be 
black and 3 red. 

In this case the total number of ways is -52°. To find the 
favourable ways we divide the 6 packs into 3 which are to 
show red, and 3 to show black, and multiply by 26°; the 
answer is 6! 26° 5x4 

(3 1f2) 52° C28 

It would not have been easy to say off- hand which of these 
problems would have the larger answer. 

There are two general principles which are of fundamental 
importance in doing simple problems of the present sort; 
these must now be explained. Suppose that there are two 
events which are mutually exclusive, if the one happen the 
other cannot; and suppose that their respective probabilities 
are p, and p,. Let there be a large number WV of trials, and 


= 0-312. .! 


Problems. 
1. In how many ways can a boat’s crew of 8 be chosen from 20 men ? 
2. In how many different ways can two dice appear? How many times 
will each possible sum appear ? 
3. Prove the ‘ multinomial theorem’, namely, 
n! 
~ v 
(atb+co+...1)"= > view raion ne 
a+B+,..frA =n. 


TOTAL AND COMPOUND PROBABILITY 17 


_let the first event happen M, times, while the second happens 
M, times. Then by the fundamental definition of probability 
Lim 24 _ Lim Jf, _, 
VN>x» WV Pie N>w “N a Po: 

Now, one of the basic theorems of the infinitesimal calculus 
tells us that the limit of the sum of two variables dependent 
upon the same third variable is the sum of thcir limits, so that 

yf I,+M,, imf 44, M, 
vin| G2 |= Be [Ge + a ]=ectn 

But the limit on the left is the probability that the one 
event or the other shall happen. We may apply this same 
principle to any number of mutually exclusive events, the 
probability that one of 2 mutually exclusive events shall 
happen is the sum of the probabilities that a specified event 
shall happen plus the probability that some one of the other 
a—1 shall happen. Proceeding thus by a downward mathe- 
matical induction, we reach the 


Theorem of total probability, special case. 

The probability that one of any awmber of mutually exclusive 
events should happeu is the sw of the probabilities for 
the sepuraute events. 

(When we have a constantly increasing-number of prob- 
abilities, each individual one decreasing indefinitely, if their 
sum approach a definite limit ag the number increases in- 
definitely, that limit will be the probability in the limiting 
case.) 


Example 6] 2 dice ure thrown; what is the probability that 
the sum shown will be 7 or 11? 

The sum 7 can be shown in six different ways, the sum 
11 in only two, hence 35 +35 = 3 = 0-222. 

Here is the second principle. Suppose that we have a com- 
pound event which is the result of the combination of two 
other events. We shall assume that these two are mutually 
independent. What docs that phrase mean? If we take 


2686 C 


18 ELEMENTARY PRINCIPLES OF PROBABILITY 


a mechanistic view of the universe, no one event is ever 
independent of any other, the outcome of any event must 
have a definite effect on all future history. Nevertheless, the 
uncorrupted man of common sense hag a perfectly definite 
idea of what he means by saying that two events are mutually 
independent. His meaning may be expressed by the follow- 
ing : 
Definition] Two events are said to be mutually independent 
when the probability for either is the same whether the 
other happen or not, 


We take it as an empirical fact that there are such events 
in the universe, and that we can tell them when we see them. 
Suppose, then, that we have two mutually independent 
events, the first with the probability »,, the second with the 
probability »,, what is the probability p,,. for the arrival of 
the compound event which consists in the arrival of both? 
Let there be a large number J of trials. Let the first one 
happen J/, times, the second happen M, times, while both 
happen J,, times. 


— Lim My _ Lim 442 4M, 
You N ae M, ie 


Limi tim Me Vo _ tim Me, 
Pi Neo N {mr M, iT ie Pr Noo VO 
Now the limit of the product of two variables is the 
product of their limits, hence p,, = p,- 7. 


Theorem of compound probability. 


If a compound event consist in the conjunction of any number 
of independent events, the probability of the compound 
event is the product of the probabilities for the individual 
events. 

Strictly speaking, we have only proved this in the case of 
two independent events, but the reader will find. that the 
previous proof by mathematical induction will apply absolutely 
in this case also. 


TOTAL AND COMPOUND PROBABILITY 19 


Example 7] .A dieis thrown 12 times ; what is the probubility 
that the face 4 will appear just twice ? 

There are various ways of showing 2 fours and 10 not 
fours, all mutually exclusive and equally likely. Hence the 
answer is the probability of starting with 2 fours, and then 
running 10 not fours, multiplied by the number of ways that 
two ee" can be chosen from 12, thus 


12x11 5x1 
x (e ax isle eb snickte SAN TS 
ays 2 Ces 
¥ 


Example 8] <A throws 3 coins, B throws 2; what is the 
chance that A will throw a. greater number of heads 
than B? 


Note the wording of the problem; A is not to throw as 
many or more heads, but actually a greater number. This 
can be done in three mutually exclusive ways. We give them, 
with their chances : 


a throws 3 heads 4X1 =H. 
A throws 2 heads, B does not gx = ¥. 
A throws 1 head, B throws 2 tails 3xi= 3. 


Total probability 4§ = 4. 


Example 9] A card ws drawn at random from a pack and 
replaced, then a second drawing is made, and so on. 
How many drawings must be made in order to have 
a chance of 4 that the ace of spades shall uppear ut 
leust once? 

It is assumed that the cards are properly shuffled after each 
drawing. The different drawings are, thus, independent 
events, with the same probabilities each time. The chance 
that the ace of spades will never appear in 1 drawings is ($3)". 

We desire the contrary of this, namely, 


log 2 
~ Jog 52—log 51 
On? 


= 36— 


20 ELEMENTARY PRINCIPLES OF PROBABILITY 


Example 10] In how many throws with a single die is there 
an even chance that the number 6 will appear at leust 
once? 

("= 4, 
log 20 
log 6—log 5 


Example 11] 2 dice ure thrown; in how many turns is 
there an even chance that double sixes will appear at 
least once? 

(38"" = 4 
log 2 


Nake se) ae 
log 36 —log 35 


These examples are of not a little bistorical interest. Two 
dice can appear in six times as many ways as one die; with 
one dic there is more than an even chance to see the six in 
four throws, while with two dice there is less than an even 
chance to sce double sixes in six times four, or twenty-four 
throws. This simple fact is known as the ‘paradox of 
Chevalicr de Méré’, about which Pascal wrote to Fermat : * 

‘Voila quel étoit son grand scandale, qui lui faisoit dire 
hautement que les propositions 1’étoient pas constantes, et 
que l’arithmétique se démentoit.’ 


Example 12] Three players A, B, and C play under the 
following conditions. In cach turn the chance for 
success os the sume for each of two contestunts. A and 
B play together the first twrn, the winner plays with C, 
and if he win uguin he wins the game; of not C plays 
with the third man und so on wntil one man has won 
lwo turis tiv succession; awhat is the chance for each 
player? 

Let us begin by showing that there is a zero chance that 
the game will go on for ever. The only way that this could 
happen would be for the winner of each turn to be other than 
the man who won the turn before, and the chance for that is 

BX OX cg ee, 


* Pascal, Quvres, edition of 1819, vol. iv,"p. 367. 


TOTAL AND COMPOUND PROBABILITY 21 


A and B have equal chances. We first find C’s chance, 
then one-half the difference between that and 1 is the chance 
of A or B. CO might win his first two turns. Or he might 
win his first, lose his second, and win his third and fourth, 
after the man who beat bim at his first turn has been defeated 
by the other man, and so on. His chance is thus 


Pe er ol ek =;[1t+gtuts | 
mete OF Rk pe er real as ses | 
ee Ne See 
Sk eS ea 


Chance for A or B is 35. 


Tchebycheff’s Example] What is the probability that two 
integers chosen at-random shall be relatively prime ? * 


The chance that the first integer shall be divisible by 
a prime 7, is the chance that its remainder, when divided by 7, 
should be equal to zero. Assuming, then, that all remainders 
are equally likely, the chance is 1/r. Hence the chance that 
7 is not a common factor of the two is 1—1/r?. 


* Cf. Markoff, loc. cit., p. 148. 


Problems. 


1. Let n dice be thrown. In how many throws is there an even chance 
that all will appear sixes at least once. Show that this number is not 
proportional to n as Chevalier de Méré supposed. 


2. A popular, if unaristocratic game called ‘craps’ is played as follows. 
Two dice are thrown, and one of the players will win if (a) the sum be 7 or 
11, (0) if the sum be 4, 5, 6, 8, 9, or 10, and the same sum reappears before 
7 is ever seen. What is the chance that this player will win ? 


3. In 1921 Lieutenant R. S. Hoar, U.S.A., drew five cards from a pack 
1,000 times, with the following results. Two were of the same denomina- 
tion with three scattering 412 times, three were of the same denomination 
and two scattering 23 times;two were of one denomination, two of another, 
and the fifth of a third 5 times, three of one denomination and two of 
another 1 time. Compare these figures with the numbers to be expected 
by calculation. 


22 ELEMENTARY PRINCIPLES OF PROBABILITY 


Our required probability is, then, 


ra(-B) 0-0-9) 
1 ie 


SS 
ae. 
i 
iS 


= Limit (for all primes) 


1 : 1 1 
[him (1+ + 5 )x Lim (1+ gt ge ten) x | 
These series are absolutely convergent, so that we are 


allowed to rearrange the order of the terms and change the 
order of the limits.* 


1 : 1 1 1 
jclimQGt+atgt gat): 


42 
N [' amlogede =- —+ 
1) = — 
ow | ee og adx = Cray 
as an integration by parts shows, since Lim z"*! log x = 0. 
x0 
— 
; “Toger( 1+a+a?+. ..)de=—(1+ 5 + st at ..). 
aS 
ae Mlogadx 
Apes 6) w OE 


p= 6 /xr? = 06017. 


In the special case of the theorem of total probability, we 
calculated the chance that one of a number of mutually 
exclusive events might occur. Since the events are mutually 
exclusive, it is the same thing to calculate the probability 
that one should happen, or that at least one should happen. 
When, however, they are not mutually exclusive, the two 
probabilities are quite different. It is now time to take up 
this ‘at least one’ question in the general case. 


* Cf. Tannery, Théorie des fonctions dune variable, 2nd ed., Paris, 1904, 
vol. i, pp. 152 ff. 

+ Cf. B. O. Peirce, Short Table of Integrals, 2nd revised ed., Boston, 1910, 
p. 64, 


TOTAL AND COMPOUND PROBABILITY rie 


We begin with two events. Let their probabilities be », 
and »,, while the probability that both will happen is 7). 
The probability that at least one will occur is the probability 
of the arrival of one of three mutually exclusive events, 
namely, both happen, the first happens and the second fails, 
the first fails and the second happens. Moreover, the prob- 
ability that the first happens is the sum of the probabilities 
that both happen and the probability that the first happens 
and the second fails; this latter probability will, then, have 
the value p,—Dyo- 

The probability that we seek will thus be 


(P1— Pro) + (P2—Prs) + Pre = Pr + P2— Pro: 
Let us now assume that when »—1 events are concerned, 
the probability that at least one happens is 


t=n-1 ji =84 y ert =n-1 

f,-1 = > Pica. > Pi + 31 be Pijk—-- 
aa “j=l ' 4j5,k=1 

where in any one term 2, 7, k, &c. take on distinct values. 

By the same process of reasoning, when an nth event is 
introduced, the probability that this, and at least one other 
will occur is 

t=n-1 tjan-l 1 t,j,k=n—1 

Fn-)n = = Pin — = > Pijnt = > Pijkn— 

t=1 j=l ?,j,k=1 

The probability that at least one of » events will occur 
is, by the first case, 


dN oP Fen —~-(n-1)n 


1=n j=" 12 * 
= De |S ioe >> Pay + = 2 Pijgk—* (4) 
c= 1 yea ke il 


Problem. 


What form does formula (4) take: (a) when the events are mutually 
exclusive, (b) when they are independent? Prove your answer in each 
case. 


24 ELEMENTARY PRINCIPLES OF PROBABILITY 


Theorem of total probability, general case. 


If wv different events be wider considerution, and if the 
probability for the simultaneous occurrence of the ith, 
jth, kth, &e. event be py, n» then the probubility for 
the occurrence of ut least one of these events is given by 
formula (4). 

There are not a great many interesting applications of this 
beautiful general formula, largely owing to the difficulty of 
calculating the different »’s. We shall, however, give two. 
The first is an example worked out by De Montmort nearly 
two hundred years ago.* 


De Montmort’s Example] Jf n balls in an urn be numbered 
1, 2,3, ...0 respectively, uid if they be druwn out at 
rundom, one ufter another, what is the probability that 
at least one will appear in the lurn corresponding to 
its number ? 


The probability that a specific set of / balls shall come 
; : w—k)! 
out in the right order is — . 
We 
The probability that some one set of 4 will come in order 
is this number multiplied by the number of ways that k 
objects may be chosen from 1 objects, namely, 
n! (n--k)t 1 
ki(w—b)t~ mt Ot 
Our required probability is, thus, 
1 ] 1 
pene ate ral eeeoes 
wrens C pomse bis. aw! 
The probability that no ball will come in the right place is 
| 


<i 
aie ear pam Sema ale 


These are the first terms of a familiar rapidly converging 
series, the difference between the sum written above and the 
sum to infinity being less than 1/(+1)!, we thus get the 
curious 


* Pssai Canalyse srr les jeux de hasird, 2nd ed., Paris, 1748, p. 1382. 


Can 


TOTAL AND COMPOUND PROBABILITY 25 


Theorem] Jf any large number of balls be numbered 1, 2,...0, 
and uf they be drawn out one after another from an 
urn, the probability that no ball will appear in the 
turn corresponding to its number is very close to 1/e. 

A colleague of the author’s once stated the theorem in the 
following more picturesque language: 

‘Tf all of the inhabitants of Chicago should meet together 
in one place and get extremely drunk, and then try to go 
home by guess-work, the chances that at least one would get 
back to his own bed are almost two out of three.’ 

This is one of those cases where it is fortunate that the 
probability can be calculated beforehand, and we are not 
forced to seek it experimentally. 

The general theorem of total probability enables us to set 
a limit, unfortunately not a very close one, to the size of 
a composite probability, when we know the values of the 
individual probabilities involved, but do not know to what 
extent they depend on one another. 


pee Pit Pe—Pre S 1, 
Pro = Pi t+P2—1. 
Assume Pi2...n-1 2 Py t Pot... + Pana (n—2), 
Pi2...0 te Pi2...n-1 + P,p—I, 
Pr2...0 = Ppt Pet > Py (4), 


§ 3. Expectation. 


Definition] If a person have the chance p, to receive the 
positive or negative sum s,, p, to receive 8, ... P, to receive 
the sum s,, and if these be the only sums he has a chance 
to receive under the circumstances, then the sum 

a 
> Pid; 
é=1 
is called his expectation under the circumstances, 


Problem. 

If it be found that 91% of the recruits of an army satisfy the first of 
three medical requirements, 86% satisfy the second, and 838% satisfy the 
third, what will be a lower limit for the proportion of those satisfying all 
three ? 


26 ELEMENTARY PRINCIPLES OF PROBABILITY 


Theorem 1] The expectution is the limit of the average sum 
received as the number of trials increases indefinitely. 


To prove this let us notice that if in WV trials, the sum s, be 
received 7’, times, s,, 7, times, 8,,./', times the average 
amount received is (7',8,+7,s,+...7,,S,)/N. The limit of 
this sum is the sum of the limits of the individual terms, and as 

ay 
Neg eee 
we have our theorem proved. 

The subject of expectation is used especially in connexion 
with games of chance. This branch of the theory of probabi- 
lity has always had a peculiar fascination for a certain type 
of reader, and was, moreover, the historic basis of the whole 
science. We shall therefore pay some attention to it both in 
the present chapter and in subsequent ones, even though at . 
the present time the calculus of probabilities is principally 
occupied with more serious matters. 

Definition] A turn at a game of chance is said to be 
fuir to a prospective player, when his expectation is 0, it. 
shall be called favowrable, when his expectation is positive, 
otherwise unfuvouwrable. In the same way a whole game 
shall be called fair, favourable, or unfavourable, according to 
the expectations. 

Suppose, for instance, that a player stake a sum a, with 
a chance p of winning his adversary’s stake b, while the 
chance of loss, a tie being excluded, is g. His expectation is 


pb—qa. 
If the turn be fair this is 0, and 
p/q = 4/b; p/a = g/d. (1) 


Theorem 2] Jf a turn be fair to a player, it is fair to his 
adversary, and the probability of success for each is 
proportional to his stake.} 

Suppose that a player plays two successive turns; let the 
probabilities and the stakes be p,q,a,0, in the first, and 

P2Jo''2b, in the second. Let us find his total expectation. 


EXPECTATION 27 


Pith = Pot = 1, 
PyP2 (Oy + be) + Py Go (by — 4) + Poh (bg) H Ye (Gy + A) 
= (P10, - GH) + (Dob. — F242) 
Evidently we might carry on to any number of turns, by 
mathematical induction : 


Theorem 3] <A player's expectation from a series of turns is 
the sum of his expectations from the individual turns. 

Theorem 4] Any succession of fair turns, favourable turns, 
or unfavourable turns, will constitute a fuir game, 
a favourable game, or an unfavourable gume, as the 
case may be. 


This theorem is vitally important, and shows the utter 
futility of a player’s altering the amount of his stakes in any 
game, in the hope of improving his chances. We shall discuss 
this question in greater detail in the next chapter. 


Example 13] <A has three pennies, B has two. The coins are 
all thrown, and it is agreed that the player showing 
the greatest number of heads shall win all; in case 
of a tie B shall win. How is this game from A’s 
point of view ? 

_ A ean win (a) by throwing three heads, for which the 

chance is 3, (b) by throwing two heads to one or no heads, 

charnice 2x3, or by throwing one head against two tails, 
chance 2x3. His total chance is $, and his expectation 


e§ 1 come Ey 
The game is, thus, unfavourable to A, a rather surprising 
result, 


q Petrograd paradox. A spins a penny, and agrees to give 
it to B if it come up heads. If it do not come up 
heads till the second time he will give B 2 pence, if not 
till the third time 4, if not till the nth 2"-1. How 


Problem. 
How will the game appear to 4 if they agree to begin again in case of 
a tie? 


28 ELEMENTARY PRINCIPLES OF PROBABILITY 


much should B pay for the privilege of tuking part in 
this pleasant game ? 


The game will be fair if B agree to pay his expectation as 
an entrance fee, let us therefore caleulate this expectation ; 
it is clearly 
1 1 1 


oer SOP de oe at se Ca ee ae Atay ae TH Babe Da 


res 
; 4 8 Que 2 


1 

2 
The absurdity of this answer constitutes the paradox which 
has given rise to a good deal of discussion, serving Daniel 
Bernoulli as the basis of his theory of moral value.* Theo- 
retically B’s expectation is infinite; practically, as Bertrand 
remarks,f any one would be a fool to risk 100 pence at any 
such game. B’s expectation is infinite provided the possibility 
of an infinite number of turns is admitted, and provided, of 
course, that he has an infinite fortune to start with. Neither 
of these provisos is related to actual life. Let us see how 
many times the coin will be spun on an average before heads 
come up. This is a problem in mean value of the sort that 
we shall take up at length in Ch. IV, but it will be sufficient 
for our present purposes to notice that this number is the 
expectation of a man who shall receive a penny if heads 
appear the first time, two, if not till the second, three, if not 
till the third, &e. His expectation is, then, 


1 1 1 1 
at 93 2+ 33 3+ ob onit 
Leal 2 3 4 n 
9 90 + oi t oe t ga te + sci)? 
nN+2 
slit rT 8 


The average number of turns will not, therefore, exceed 
two. Suppose that in tho lifetime of A and B it will be 


* (Specimen Theoriae novae de Mensura Sortis’, Commentarii Academiae 
Scientiarum Imperialis, vol. v, p. 175, Petrograd, 1738. For further references, 
see Czuber, Entwicklung, cit. pp. 122 ff. 

+ loc. cit., p. 63. 


EXPECTATION 29 


possible to play 2” games, As a further simplification, we 
suppose that B wins one penny about half the time, two- 
pence one-quarter of the time, fourpence one-eighth of the 
time. If he pay an entrance fee of @ each time, and if this 
be a fair fee for 2" games, we have 


Qn a = PAC, WESTER) Ole OLUSEE aoe ane | eo en 
The reason for the remainder term A is that 
Qa%™ => 14 Qn fu Qn-2 4 veo th 2 4- MN 


and we do not know what sum to ascribe to the odd turn, 
but surely A < 2% 2"2=12"-!4A 


Pee edad | 
ae 


Now if 2” be allowed to increase indefinitely, so will a, 
but that is not our present hypothesis. Let A and B play 
100 turns per hour, working 8 hours a day, 300 days in the 
year, for 50 years, the number of games would be 12,000,000, 
which is Jess than 274, so that an entrance fee of twelve pence 
would seem quite sufficient. 

q Let us interpolate at this point a problem of historical 
interest which was proposed to Pascal by our old friend the 
Chevalier de Méré.* 


Example 14] Zwo pluyers whose chances of winning an 
individual turn are p and q respectively, a tie being 
impossible, ure forced to break off a game before the end. 
The first player A is within m turns of victory, while 
the second player B is within n turns of victory, how 
should the stakes be divided ? 


We must calculate the chance of one player, say A. Here 
is De Montmort’s solution.t : 

A may win in various ways which are mutually exclusive : 

(1) He may win the next m turns, chance = p”. 

(2) In the next m+1 turns he may win the last, and some 
other m—1, chance = mp"q. 


* Pascal, Curres, cit., vol. iv, p. 360. 
T loc. cit., p. 244. 


30 ELEMENTARY PRINCIPLES OF PROBABILITY 


(3) In the next m+ 2 turns he may win the last, and m-—1 
of the others, chance =m (m+ 1)p""q?/2. 


Hence his chance of winning the game is 


pe E +mq+ ES ne 
m(m+1)...m+tn—2) 
nil HD eee 


(v—1)! 


q $4. Risk. 


It sometimes happens that we are interested in knowing, 
not merely the total expectation, but the negative part which 
is to be feared. For instance, a player who should undertake 
to play the Petrograd game because of the infinite expectation 
would do a foolish thing, thanks to the large negative part 
of the expectation. More generally, a player should not enter 
a game, no matter how brilliant the prospects, if the expecta- 
tion of loss be too large a proportion of his fortune. 


Definition] The absolute value of that part of the total 
expectation which includes the negative terms, and 
these only, shall be called the risk. 

Suppose that a man has the chances p,, p, ... Pn to win the 
positive sums §,, 8)...8* He will pay for the privilege of 
entering a sum equal to his expectation e. Arranging the 
sums in the decreasing order of magnitude, his expectation 
of gain is i=m 


> Pi ld; ~ &) aa 
pet 
while his risk is Teas 
r= > p;(e—s,) > ys 
i=m+l 


Suppose that he insures himself against loss by paying this 
sum to a speculative company which agrees to pay whatever 
loss he may sustain in the game. 


RISK 31 


His present expectation of gain is 


i=m’ 


> pi (sie), 
asi! 
his present risk is 


m f= =m 


a p(rte—s) + DY per= > p; (€— 8) + > vir 

m’+1 t=m+l1 t=n’'+1 m/+1 

t=n i=m 

Now r= > Dts; DL p(e—s) < 0. 

Cs t=m’+1 

Hence, for both reasons, the risk is less than r the previous 
risk, but the expectation of gain is reduced by a corresponding 
figure. 

If a man have the chances q,, q.-.- dn to lose the sums 
81, &, 83 «+. Sp, his risk will be q,8,+4282.+-++ Un$n» which will 
be a minimum if all the money be placed on the safest chance, 
but the chance of total loss will, of course, be much greater. 

Let us conclude by returning, for a moment, to the question 
of fair and unfair turns. It sometimes occurs to a player 
that he will be sure to win a game if he make the resolution, 
and stick to it, which may be difficult, of stopping play as 
soon as he has lost a turn. Assuming that his stake is a, his 
adversary’s b, the respective chances (tie excluded) p and q, 
and that the game is fair, we may change the unit of coinage 
so that his stake is p, and his adversary’s g. His expectation 
is then —pq+pq7(¢—p)+p?¢(29—p) +. 

The first term, and perhaps some of the subsequent ones, is 
negative, but we soon find positive terms, so that the risk 
is small. Evaluating this we get ° 


—pq[1+(p—9q) + p(p—2q)+...] 
= gilt (p= itp) lp (p24 l—9g}) a... | 
= ()), 

The resolution to stop after the first loss is unwise, except 
for a very poor man. It will discontent the adversary, because 
it seems unsportsmanlike, but will not change a zero expecta- 
tion into a positive one. 


CHAPTER III 


BERNOULLI’S THEOREM 
§ 1. The Problem of Repeated Trials. 


THE celebrated theorem which gives the title to the present 
chapter is of central importance in the theory of mathematical 
probability. Certain persons have argued that it gives 
a proof of our second empirical assumption. ‘This is an error. 
No mathematical formula can prove this assumption which is 
deduced from experience of concrete cases. The confusion 
arises from the fact that the theorem deals with ratios which 
arise when an experiment is tried a large number of times 
under identical conditions. 


Fundamental example] Zhe probability for success in a certain 
trial is p, the contrury probability for failure is 
g=1—p. If xn trials be made under the same essential 
conditions, whut is the probability for exuctly 7 successes 
and n—r fuilures? 


The probability of starting off with 7 successes, and follow- 
ing with failures thereafter is p”q”~", and the probability 
sought is the product of this multiplied by the number of 
ways in which the » trials can be divided into 7 successes 
and n—r failures, namely 

ws yn? (1 
ri(n—r)? Ao ) 

Before deducing results from this very important formula, 
we shall give one or two auxiliary results which are of 
interest. 

Example 1] Zhe probabilities being as above, tt is agreed 
that a man shall receive one dollar for each trial 
necessary to achieve exactly r successes; what is his 
expectation ? 


This amounts, by theorem 1 of the last chapter, to asking 


THE PROBLEM OF REPEATED TRIALS 33 


what will be the average number of trials necessary to achieve 
r successes. If this number be 2 we have 


n= rp t(r+l) part (r42)pr qu ty” 4. 


= r/p. (2) 


This suggests the idea that since the average value of n, 
when 7 is given, is 7/p, so the average value of 7, when 7 is 
given, will be np, and this we shall soon see to be the casc. 


Example 2] In n successive trials of win event the probabilities 
for success being M1 Py.-Py respectively, whut is the 
probability for gust r successes ? 

Let the equation whose roots are p,, pz... Ym respectively 
be written 


(a—p,) (@—p,) ... (@— pn) = 2" — 8,0") +3300" * ee 


I 
S 


The probability sought will be 
7 a > PaPs ae Po (Ql —p.) pare (J =a) 


where the first products give every term of s,, and the p’s in 
the last factors are those which do not appear in the first 
ones. Multiplying out we get 
r+1)(r+2 
+P.=8,—(7 +1) 8,4, 4+ eet) Seg tees (3) 
This is known as ‘ King’s formula’. 
We now return to our formula (1) and ask the important 
question, for what value of 7 will this be a maximum? To 
find this maximum, we write the ratios of this term to the 


Problems. 
1. Deduce formula (1) as a special case of (3). 
2. An event can happen in k mutually exclusive ways, and no others, 
the respective probabilities being p,, p....p,. Find the probability that in 
n trials it will happen in the first way r, times, in the second 7, times, in 
the kth 7; times. 
2686 D 


34 BERNOULLIS THEOREM 


preceding and to the succeeding ones. The first ratio will 
be greater than or equal to unity when 
a ea gt oan 
oe Gl ‘ 
(n+1)p—rp 2 rq, 
(n+1l)p2r. 
In the same way, the second ratio will be greater than or 
equal to unity when 


Re) 
w—rp ? 
rq+q 2 np—rTp, 

r= np—q. 


We have, thus, for our largest term 
np+pZ2r2z np—q. 
The two limits differ from one another by unity. 


Theorem 1] If the probability for success of an event be p, that 
for failure q, in n trials the most likely number of 
successes will be that integer which lies between the 
lumits np+p and np—q. 

In practice, it is usual to take 2) as the most likely value 


for 7, the number d=r—np (4) 


is called the discrepancy. Let us find its average value, that 
is to say, the expectation of a man who will receive a sum 
ee to this discrepancy. 


: rqn-r 
> (enti) ri(n—r)!” 1 
= = r nt pg tap (p+9)" 
ae eee i 
Now pg! aas55 =? rg ts o+g = 1, 


Hence we have 


) 
n co ud ee (p+q)" = np—np 


THE PROBLEM OF REPEATED TRIALS 35 


This shows that np is not only the most likely value of -, 
but is also its average value. Let us now find the average 
value of the square of the discrepancy. This will be 


> (r—np)2— egut 


cs ri(n—r)! 


= n! 
ahs 2 n—r 
= of ETRE 

> aia 


- rl (mv— 
—2np Be en "+1? p? (p+q)" 
en n! ; . 
= > ae Coa nied 
=O). 
But (et Ok ee = Ps eee AS os ’). 


Hence our first term is 
r) wert: n=-1 
Px|P Px(r+9"|=Ps[mp(P+9) ] 


= np (p+q)*+n(n—1) p?(p+qyr” 
= npt+n'p?—np*. 
Our average value is thus np—np* = npq. (5) 
The expression d/n = r/n—p (6) 
is called the relative discrepancy, for it is the discrepancy 
between the actual proportion of successes and the average 
proportion of successes. The average value of its square is 
pq/n. (7) 
Let us see what is the probability of a discrepancy not 
greater numerically than a given positive number D. The 
number of such discrepancies is 2D +1 and the probability is 
less than this number multiplied by the probability of a zero 
discrepancy. Let us calculate the variation of this latter as 
m increases indefinitely. We have to find the limit of 


a! 1 oY A Be i ic 
: os ———— i 4) 11) wo. 
ea) ( n ) ‘ Pp ee 
Let us first find for what value of 7 this will be a minimum, 
n being fixed. Changing r to 7+1 we get 


nv! ae ae 
(r+1)!(n—7r—-1)! 
t 


36 BERNOULLI'S THEOREM 
This divided by the previous expression will be 


y+ pus) (a a ee 
CS n—? : 


How long will this expression be as great as 1% Evidently 
as long as its logarithm is not i te 
1 
= = Ut 
log (1 ray, - 


rlog (a + ~)+(u—7) log (1 o 


Soe y, if r and 7 be very large, 


Defers) ae) lea ramet ee eee ae 


ae wea es 
2(n—7r) 2r— 
N—rT Zr. 


Theorem 2] When the number of trials is given, the prob- 
ability of « zero discrepancy is minimum when the 
probability of success for an individual trial is equal 
to one half. 

It would really be more interesting to know for what 
value of 7 this probability would be a maximum, not a mini- 
mum, but this problem does not yield to treatment so easily 
as the other. Instead we shall show that for every value 
of 7, other than » or 0, the probability of a 0 discrepancy 
approaches 0 as a limit, when 1 increases indefinitely. 

The probability for a discrepaney d = r—mp is 


n! 


- it _gynptd pnq-d 
(up+d)! (nq —dy? aN 


S" iu yuptd ong-d 
ane % = (npt+d)! (ng—ay ein Re 
wt, 
(xp)! (1uq)t 
Suppose that pq # 0 and that, contrary to fact, this greatest 
terin is always > Nv > 0. 


The greatest term is 


preg o"?, 


n! rw ae ye 

(rp+d)! gaa nN, 

_ ng(nq—1)...(ng—d+1) sp a mp gna 

TH a) Tee Oe EO Ae (ip)! ( nq) yt! 1 
ng (ng—1)...(19-—d +1) ry 

(1p +1) (p+ 2)...(uptd) \q 


Then 


THE PROBLEM OF REPEATED TRIALS 37 


For any fixed d, the limit of this as 1»—o is K. Now 
let v be so large that vX > 1. Then let » be so large that 
np >v, nqg>v. The limit of each of the v largest terms is 
KX, hence the limit of their sum is vk > 1, which is incon- 
sistent with the fact that the sum of all terms is 1. This 
proves the falsity of the assumption that the largest term is 
always greater than K. The probability of a discrepancy 
will thus approach 0 as a limit. The same is true of 2D+1 
times this probability, a number greater than the probability 
of a discrepancy numerically not greater than D. 

In the case of the relative discrepancy the matter is exactly 
reversed. We see from (7) that the average value of the 
square of this decreases indefinitely as 2 increases, hence - 
the probability that the square should have a valuo as great 
as any finite value approaches zero as a limit. We thus get 
Bernoulli’s complete theorem] When the number of trials is 

increased indefinitely, the probability thut the dis- 
crepancy shall remain numerically less than uny given 
number, and the probubility that the relative dis- 
crepancy shall renuin numerically greater than uny 
given umber, will both upproach zero as a limit. 

This theorem is always regarded as central in the whole 
doctrine of probability, and although it emphatically does not 
tell us anything as to how events have to occur, it is, under 
the conditions of our first empirical assumption, exceedingly 
illuminating as to the way that they usually occur. Of 
course, to such a writer as Keynes, to whom mathematical 
probability appears but a subsidiary part of the whole subject, 
Bernoulli’s theorem is of secondary importance,* but in any 
objective treatment it must be fundamental. We shall later 
suggest another much shorter proof, which depends, un- 
fortunately, upon the use of approximate expressions to be 
developed presently. The proof given above is, perhaps, new. 
Bernoulli himself considered only the case of the relative 
discrepancy, his statement being as follows: + 

Sit igitur numerus cuswum fertiliwm ad iwmerwm steri- 
lium vel praecise, vel proaime in ratione r/s udeoque ad 

* Keynes, loc. cit., pp. 386-45. 
+ Jacobus Bernoulli, Ars Conjectandi, Basle, 1718, p, 2386. 


38 BERNOULLI'S THEOREM 


“numerum onniwn in ratione r/r+s seu r/t quam rationem 
terminent limites r+1/t, r—1/t. Ostendendum est, tot posse 
capi expervmenta, ut dutis guodlibet (wuta c) vicibus, verisimi- 
lius evudat numerum fertiliwm observutionum intra hos 
limites quam extra cuswum esse h.e. wumerum fertiliwm ad 
numerum omnium observationum rationem habitwum nec 
muyorem quam 7+ 1/t nec minorem quam r—1/t. 


§ 2. Stirling’s Formula. 


In our formula (1) for repeated trials, as well as in many 
- formulae of an elementary nature, we have to do with 
factorials. These are easy to write, and not hard to evaluate 
when the numbers involved are small, but can become 
exceedingly difficult to estimate when the highest factor is 
large. It is therefore extremely useful to be able to replace 
them by approximate values. 

What do we mean by an approximate value for an ex- 
pression? Ordinarily these words signify another expression, 
differing from the first by a small quantity ; so small, in fact, 
that its presence may be overlooked. Unfortunately we have 
no such scheme for approximating to factorials. But when 
a function of a certain argument increases indefinitely with 
that argument, then a new function bearing to the first 
a ratio that approaches unity as a limit may be used to 
replace it in the sense that the error will bear an infinitesimal 
ratio to the function itself. The difference between the two 
functions may actually increase indefinitely, but if their ratio 
approach unity as a limit, then in ratio problems the one 
may be used safely as an approximate representation of the 
other. For instance, the difference between the two functions 
of «, 2? and a+, increases indefinitely, but. their ratio 
approaches 1 as a limit. A function whose ratio to a given 
function approaches 1 as a limit, as the two increase in- 
definitely, is called an asymptotic expression for the given 
function. It is our present task to find such an expression 
for n!. We shall do so by guessing at various factors, till 
we reach a form where the unknown factor may be treated as 
a constant. 

As a first approximation, let us note that 1! has 7 factors, 


STIRLING’S FORMULA 


39 


the largest of which is n. We therefore start with the crude 


assumption 
ni=n"o(n), 
(v+1)!= (n+ 1)"*1 p(n+4 1). 
Dividing p (n)/p(m+1) = (1+ 1/20)" 


Kim n—> © aon EE Cs 
(7) 
Lim n —> 0 Cie 2 et 
(2) 
This suggests a second factor, and we write 
ni= ne" (n), 
2 
ae 
rat ee = (2 a -) 
1 
log y (n)—log y(n +1) =—1 +n(, = oad oe 


1 1)\2 
log n2—log (n +1)? = —log (1 + =) 


oe ab (12) 
Lim n~o Ceeoa) =1. 
Ewe 
This leads to our next approximation 
al = nme n2 F(n), 
eNO, el (1 fae 
F'(n +1) 77 


Log F (7) —log F(n + 1) 


== 


1 =) 1 1 . i 
a eae ¢ T ) (= ~~ 200% 8in? 


= 1/1207 —..,. 


40 BERNOULLIS THEOREM 


The series on the right is convergent, and is composed of 
terms which are alternately positive and negative. The sum 
of a number of terms is alternately greater than or less than 


the limit. Hence . 
log F(n) > log F(n+1) > log F(m)—1/12n?, 
log (1) > log F(x) 


Uy ye Al 1 1 
ese Oe rer : (7 +1)? 3) (av + 2)? ) 


lg etial te ety cea 
ww (n+1y (n +2)? 


1 1 1 
pe (n—1)” : n(v+1) i (77 +1) (m+ 2) a 


The convergent series on the right may be written 


1 
(eS ae G Sasa (aa ee ge art 


10g F(n) > log Fle) > nas a 


Ag log (1+ = \> . > : 
oO oO es 
ee s( 10n) 7 10n  200n* ~ 12(w—1) 


when 1 > 6. Hence 


eg Fun) Sslee Foo) slg ee 


| Ra eae 
e 10” 


If thus, when 2 > 6 we replace F(n) by F(«©), we have 
divided by a factor lying between the limits 1 and 1 + — 
nv 

There remains only the task of finding the value of the 


constant (©). We do this by a roundabout method. 


1/2 
Let I(m) =| sin™ a da, 
0 


I(2m) > 1(2m+1) > I(2m+2), 


I (2m +1) . T (2m +2) 
a) (200) I (2m) 


STIRLING’S FORMULA 41 


It will appear in the course of our work that 


: I(2 
Lim m—> tga a I 
/ I (2m) 
1/2 
| sin” @ dx 
0 
he /2 7/2 n=2 2 
= —[sin? a cos a'],” +(n-1)| sin”™ & cos? ada 
0 
Nm lp 
= | Sin * 2 de; 
nm do 
2m—1 2m—83 ere 
£(2m) = 3 = Oye eS 
CO eserves Moyen 2 ih ue 
_2m—1 .2m—3 3 
1 MBN 2st 2 29 
2m 2(m—1) ee 
I(2m+1)= 5. EAS eG Eg 
ede). Aaa ae Pi a ee 
_ 2m 2(m—1) red 
241 -2m—1- -"* 38 1? 
I(2m+1) _ [2m-2(m—1)-...-1]? 2 


~1(2m) ~ (2m-+1) |(2m—1)-(2m—3)... 1] T 
_ [2m-2(m—1)-...-1]}* 2 
~  (2m+1)[2mif ow 


. Q4m msn e—4 (/my)4 [F(m)]}* Dy) 
ss (2 +1) (2m)*™ (e722 (-o/ 2m)? [F (2.m) |? or: 
if m _ LFim) I 
— (2m+1)r [LF (2m) |* 
Passing to the limit F(a )? 
de 
F(0) = /27. 


Stirling’s formula: If the expression n! be replaced by the 
expression 0” e"/2arn the true value will have been 


. at ., : 
divided by a number lying between 1 and 1+ F071 =k 


* Stirling, Mcthodus differentialis, &e., London, 1764, p. 138. 


42 BERNOULLIS THEOREM 


A table of the values of log e* and log e~* will be found at 
the end of the volume. 
We shorten this formula by writing the untrue equation 
ni= ne J2rn. (8) 


The development above shows that we should be more accu- 
_rate if we wrote 


i — Ten "V2rn(d + 
12" 1b 


The gain in accuracy is, for most purposes, not worth the 
additional complication. For instance, we find * 
10! = 3,628,800 10!e-10 /207 = 3,598,699. 


Difference = 30,101. Ratio = 1-008. 

As an example of the use of Stirling’s formula, let us caleu- 
late the probability of a zero discrepancy. We have by 1) 
and 1] 

n! ne” J 2arn 
(mp)! (nq) se ~ (np) )PP(nq)"%e-™ (P+9) 2 rn pq 
= 1 
/ 2rnpq 


pre nd 


(9) 


This expression decreases indefinitely as m increases, which 
gives an immediate proof of 2] and of Bernoulli’s theorem. 


§3. The Probability Integral. 


We have seen, by two different methods, that as increases, 
the probability of any ono discrepancy, even the most likely, 
approaches 0 as a limit. In order, therefore, to concern our- 
selves with probabilities of finite magnitude, it is wise to 
change our problem, and calculate the probability that the 
discrepancy shal] lie within specified limits. We are imme- 


* Czuber, Wakrscheinlichkeitsrechnung, cit. p. 24, 


Problem. 


A coin is thrown 100 times; calculate the probability that it will 
show exactly 50 heads and 50 tails, 


THE PROBABILITY INTEGRAL 43 


diately faced by the question, What will be limits of a reason- 
able size? We have seen that the average value of the square 
of the discrepancy is proportional to n, and this suggests the 
propriety of calculating the probability of a discrepancy lying 
between two different constant multiples of “7. In particular, 
let us calculate the probability of a discrepancy lying between 
2, V2 npq and z, 42 npq, including the limits. 
We first revert to (1), putting 
r= nptev/2rpy, u—T = ny —2V 2npy. 
We seek 


E> 23 


nu! (np +cr/2npqg) (nqg—2/2npq)- 
— === ) q 
zon (Ip tev 2npq)! (ng—2V2npq)! 


The deficiency increases by 1 each time, and the expression 
above is the sum of a number of ordinates at unit intervals, 
and so the sum of the areas of a system of rectangles of unit 
base and varying altitude. We wish to find the limit of this 
sum as 7 increases indefinitely. 

The next point to note is that as 1 increases, instead of 
imagining the height of each rectangle to decrease toward 0 
while the total base length increases indefinitely, we might 
equally well imagine that the total base (z,—2,)/2npq 
remains constant, while the bases of the individual rect- 
angles approach 0 as a limit, so that the sum of the areas of 
the rectangles approaches the area under a smooth curve as a 
limit. Our problem is to find the nature of this curve. We 
have, essentially, an infinite sum of infinitesimal terms, and 
Duhamel’s theorem tells us that we may replace each by 
another infinitesimal bearing to it a ratio which approaches 1 
as a limit.* In fact, it will be wise to split each of the 
various quantities to be summed into factors and replace each 
factor in this way. As a first substitution, we replace the 
various factorials by their equivalents from Stirling’s formula 
(8), getting 

* This theorem is at the basis of the integral calculus. See e. g. Osgood, 


Differential and Integral Calculus, New York, 1907, p. 164; Annals of Mathematics, 
Series-2, vol. iv. 


44 BERNOULLI'S THEOREM 


Zz 


S we” J 2rn 
= 4, (IptzZ Vinee mpg)" + 2/2upg (ng—2V72 D npqya- ev 20P mPa 27e" 


pre + 2/2 npq) gita—2v2 mp1) 2npq) 


& 


* Fon Va oe 


~ Vv 
i = See (np +27 2npq) (ng—z V2npq) 


( np ‘ake aa mY ng—2/2upq 
np+eV/2npy RPA TTT) 

We next note that as the quantity 27 2npq increases by 
unity each time, we may properly say that z has an increment 


Az where Az = 1/V(2npq). 
Lim ee ae a a aoa 


2a (np teV2npy) (ng—zVanpq) Var 
The ratio approaches 1 wniformly as we see by expanding. 
ioe ee + : a 2npq ie + 2n/2npq 
p 
= (npt+oVv2 ana (=, = “tig + <i? ()) 
iy — ZV 2npqyni-ev 2p9 
log (“rt ) 


ng 
=2 


= (ng— 2/2 pq) (— z 22 - oe <¥())> 


ne v 

1 Up + 2 V2 apaye + 2/2npq (“ —sV2npy py ee 
we od re 

og ( np ug 


1 
ge ey HE) 
ar et 


~ 
~ 


Lim (. np aac uy s\n kod 
np—2 V2 npq 
ae Oy ps Peaks 
Since F(z) remains less than a fixed amount, the ratio 
approaches 1 uniformly. 


THE PROBABILITY INTEGRAL 45 


We therefore seek 


Z= 2 A 
Lim Ss e7? z 
42:0 V 1 

s= 81 


and this, by the fundamental theorem of the integral eal- 
culus, is * 
Lair 
Vode, 
This formula may be made slightly more accurate in the 
following fashion. When a large number of rectangles is 
replaced by a curve, the number of rectangles is less by one 
than the number of points where the curve meets an upright 
side of a rectangle. When we replace 
X feaz by | seae 
Biel a 
if, as is perfectly legitimate, the value of f be taken for 2 at 
the left end of the interval, then the term f (z,) Az is lost in 
passing to the integral, so that a more accurate form for the 
probability will be 


—22dz 


1 Paper 1 ere 
The number of consequences deducible from these formulae 
is very large. Let us first find the probability that the 
numerical value of the discrepancy shall not exceed <. 
2, V2 pq = —2,/2npq = d. 
This gives us 
Laplace’s theorem ¢] If the probubility for success be p, and 
that for failure q = 1—», then the probability that inn 
trials the discrepancy will not exceed numerically the 
number d is nearly equal to 
d 


oS os Bako 
ahs Da ea a a arr (10) 
V1 Jo 

* Thid., p. 155. 

+ Laplace, @urres, vol. vii, Paris, p. 284. For an estimate of the size of 
the error committed by using these formulae see Castelnuovo, loc. cit., 
pp. 83 ff. The development which we have given follows the genera} lines of 
Markoff, loc, cit. 


46 BERNOULLIS THEOREM 


and is still more nearly equal to 
ad 

2 [ vee ays: 
Fa\ C=" dz+ Far 
It cannot be too often emphasized that these are merely 
approximate expressions for the quantities desired. The 
integrand is an even function, suggesting that equal positive 
and negative discrepancies are equally likely, and this is not 
the case. When p<, there are positive discrepancies possible 
which are much greater than any possible negative discrepan- 
cies. Let us push this a little further, before we leave the 
exact formulae for ever. The probabilities for a discrepancy 
dl or —d, are by (1) 


n! nm! 


a ee cn eT +d -d. np-d gngtd 
(np+d)!(ng—ayi? ee (np—d)!(nq+d)!” Aya 


The ratio of the second to the first is 


ESO) 1" 
(np—d)!(nq+d)!\p 


a fala (11) 


_ Pnpqt+aqd) fnpa+q (d—1) npg —q (d—1) 
of: bea aa ie +p (d— 7m a ls (d—1) 4 

When p < q, and d = 1, this expression is greater than 1. 
Further, when we increased by unity, the product of the two 
factors multiplied in is greater than one, as long as 
mpq > d(d—1). Hence the expression is surely greater than 
one for these values of d, and we get 


Theorem 3] When the probability for success is less than one 
half, the probability for each small possible positive 
discrepancy is less than that for the corresponding 
negative discrepancy, the ratio of the two, however, 
approaches 1 as a limit as the number of trials increases 
indefinitely. 


This may also be surmised from the fact that the average 
discrepancy is zero, and as there are in the present case more 
possible positive discrepancies than negative ones, it would 
seem natural that in every case the probability for the positive 


THE PROBABILITY INTEGRAL 47 


discrepancy would be less than that for the corresponding 
negative,* 

Let us return definitely to our approximate formulae. The 
function 


E [* ede (12) 


a V1 Jo 


is absolutely fundamental in the theory of probability. A 
table of the values of this function will be found on pp. 209- 
13. Let us find its value when c—>o. We wish to evaluate 


wW 2 i 2 gf 
= — Cm dee 
VT) 5 


Let 2 Oy, 


0 (x) 


00 
ae gon dy, 
VT Jo 


PAH [ee 


vant 


4 ~ 2 il 2 2 2 ca 2 
= ada | e (+a) a da — sa | Cured. 
T Jo 0 Vm Jo 


We may reverse the order of integrations on the left, hence 


2 4 a —(1+2) a? 
Ue == | de e uda. 
To 0 


The first integral is 


ne 2 — Me 
Oe ae ee os 


[ - He 4 -a+)0] 2 
2 (1+ a2) ve 2 (EF e)* 


A ier de 
—r)e kaa 7 
2 [ee) 
on e~* dz=1. (13) 
0 


_ At this point the careless reader might be led into a very 
grievous mathematical error by supposing that this formula 
was self-evident. He might say, ‘The probability that the 
discrepancy will take some value between —o and o is 


* Cf. Simmons, ‘A New Theorem in Probability ’, Proceedings London Math. 
Soc., vol. xxvi, 1895. 


48 BERNOULLIS THEOREM 


equal to 1.. This is the probability obtained from (10) by 
letting d become infinite, which proves (13) without more 
ado. But we must repeat again and again ad nauseam 
that (10) is merely an approximate formula, for the following 
reasons : 


(1) The integrand is an even function, whereas we saw in 
theorem 3] that, in the general case, equal positive and nega- 
tive discrepancies are not equally likely. 


(2) We started with the assumption that we were seeking 
the probability for a discrepancy between two limits propor- 
tional to the square root of the number of trials. We obtained 
an approximate formula good for such limits. This formula 
gives a finite if extremely small probability for a discrepancy 
of every numerical magnitude, whereas it is physically impos- 
sible to have a discrepancy numerically greater than the 
larger of the two numbers np, nq. 

The verification of fornula (13) by means of the approai- 
mate formula (10) must be looked wpon as a fortunate 
accident. 

Strangely enough there is another such accidental verification 
which we now procced to establish. We apply the law of the 
mean to (10) getting, as the probability for a discrepancy close 
to x, the expression 

1 pag 
aa 2npq dae (14) 

The expectation of a man who will receive a sum equal to 
the square of the discrepancy, i.e. the limit of the average 
value of that square 2f formula (10) were universally valid, 
would be 


2 


* ae" Enn ate a 
sa | : : ede da SS | 222 é..2 npq dx. 
Vm)Jo S2upy V2umrpq Jo 
x dv 
Let be nen monn ut) ee dt. 
J 2npq V2 upg 
Problem. 


Prove Bernoulli’s Theorem by means of (10) and (18), 


THE PROBABILITY INTEGRAL 49 


Peppa {ae ea. tne 
Average == ea | ate" dt. 
Let 2te~" di = dv t=. 


Since te~" vanishes at both limits, if we integrate by parts 


we find: 


Average < — | edt = npq. 

This, by a fortunate accident, checks exactly with (5). 

Let us calculate the expectation of a man who will receive 
a sum equal to the numerical value of the discrepancy. In 
this case the exact formula is hard to manipulate; we there- 
fore, emboldened by recent success, take formula (10) as though 
ut were exact and wniversally valid, knowing that it is always 
near the truth. We have 

ai ve eres = v2 upg 2 te~P dt 
2 WpqT 0 Vr Jo 


=a/= Vupy. 


What is the value of the numerical diserepancy which 
there is a half chance of reaching? We have, by (12) 


Nees —)=; a 
Now, by our table, p. 209, 
© (0-4769) = 4 
Hence d = 0:4769 V2 J npy 
= 067457 np. 
Let us recapitulate these last results. The square root of 


the average value of the square of the discrepancy is called 
the mean discrepuncy ; its value is 


Vv npy (15) 


Problem, 
Caleulate by the same two methods the average value of the fourth 
power of the discrepancy. 


2686 K 


50 BERNOULLIS THEOREM 


The average value of the numerical value of the dis- 
crepancy is called the uverage discrepancy; its value is 


0:798 0 yy (16) 
The positive number which there is a half chance that the 


numerical value of the discrepancy will not exceed, is called 
the proballe discrepuncy ; its value is 
0-674 Vnpy (17) 
How accurate has Bernoulli's theorem turned out in practice? 
There is a good deal of testimony on this point, generally 
highly unsatisfactory. Karl Pearson made an analysis of a large 
number of statistics from the roulette games at Monte Carlo, 
and came to the conclusion that whereas the alternation of 
red and black was satisfactory, there was an incredible excess 
of long runs.* Exactly opposite conclusions were reached by 
Marbe,f who maintained that, on the contrary, short runs 
were predominately the rule. Then Griinwald took up 
Marbe’s work and showed, at least to his own satisfaction, 
that the apparent result was due to faulty grouping of the 
observations.[ By a proper re-grouping, the results showed 
a very satisfactory proportion. An account of the work of 
Marbe and Griinwald is given by Czuber ;§ the average reader 
will strongly suspect that Griinwald was right, and the other 
two wrong. 


Example 3| The philosopher Buffon || one day threw a coin 
4,040 times, and noted that heads arrived 2,048 times. 
Ts this remarkable ? 


In this case p and g are both 3, n = 4,040, d = 28. 
The chance for discrepancy of this size or less is 


28 
O(a ) —~ 0.622, 
2020 


* Pearson, The Chances of Death, vol. i, London, 1897, 

+ Marbe, Naturphilosophische Untersuchungen sur Wakrscheinlichkeitslehre, Leip- 
zig, 1899. 

t Griinwald, Isolierte Gruppen und die Marbesche Zahl p'’, Wiirzburg, 1904. I 
have not been able to verify these two references taken from Czuber. 

§ Wakrscheinlichkeitsrechnung, at vol. i, pp. 144 ff. 

|| Bertrand, loe. cit., p. 9. 


THE PROBABILITY INTEGRAL §l 


There are, hence, nearly four chances in ten for a discrepancy 
numerically larger than the one obtained, and the result 
must be looked upon as not unnatural. The probability for 
obtaining exactly this discrepancy is less than that for a 
discrepancy 0, which latter is 


1 - 1 
“« /20207 80 


. 


Example 4] Two men, each with five pistoles, toss u coin. The 
jirst player wins if it show heads, the second if it show 
tails, the loser owing the winner one pistole. This 
money is not paid immediutely, but an uccownt is kept, 
the balance to be paid ut the end of the game. In how 
many turns will there be an even chance thut the loser 
is more in debt to the winner thun he can pay ? 

Here, again, the chances are } for each. In how many 
turns will the probable discrepancy be 5? Applying (17) we 
have 0-675Vin=5, 

NM = 220, 


It should be noted here that, if the loser had paid cash each 
time, there would be more than a half chance that one player 
would be ruined before now, for a discrepancy of 5 at the end. 
of 220 turns is compatible with a larger discrepancy at an 
earlier stage of the game. 


Example 5] How many times must a die be thrown to pro- 
duce a probability of 25 that the ratio of the number 


Problems. 

1. In 1850 the Swiss astronomer Wolff threw two dice 100,000 times. 
The two showed the same face 16,647 times. Comment on this result. 

2. The game of ‘ craps’ was explained ina problem on p. 21. Prof. Ban- 
croft Brown (American Mathematical Monthly, vol. xxvi, 1919) has tabulated 
the results of 9,900 turns, where the two players won 4,871 times and 
5,029 times. Is this result surprising ? 

3. Discuss the results reached by Lieut. Hoar in dealing five cards, as 
given in the problem p, 21, from the point of view of Bernoulli's theorem. 

4. How many times must a coin be thrown in order that there may be 
9 chances in 10 that the discrepancy is numerically greater than 5 ? 

5. Two dice are thrown 100 times. What is the probable discrepancy 
in the number of times that the sum 7 appears ? 


IDPs 


52 BERNOULLI’S THEOREM 


of 8’s shown shall bear to the number of trials a value 
between #5 and 8; ? : 
Here we have a relative discrepancy of +4, hence 


n 1 5 ih ite 
t= — =>- i G Eye : = 3 
é 30° P 6” q 6° e) 5 N10 0-9, 338 


§ 4. Games of Chance.* 


There is a very real difficulty involved in handling some of 
the fundamental problems arising in games of chance owing 
to the presence of a subtle but important psychological 
element which cannot be well stated in mathematical terms. 
Gamblers are notoriously superstitious, which means that 
they are irrational, still more are they unmathematical. For 
that reason, the assumptions which are made are of a tentative 
nature, and only partially represent the real facts. 

In the games of chance of the type which we shall consider 
there are two individuals, whom we shall call the ‘Player’ 
and the ‘ Banker’ respectively. The former has a considerable 
freedom in deciding upon the amount that he will stake, but 
we shall assume that there is an upper limit to this, as, other- 
wise, a syndicate of players with a large capital might be 
formed, and this body might keep on doubling the stakes 
after each loss, till the Banker was ruined. We shall assume 
that the Player has a fortune A, and that he intends to play 
until either he has lost this amount, or won the sum B from 
the Banker. If the player have any wisdom at all, A will be 
less than his total fortune, and B far less than the sum of the 
Banker's quick assets; the word ‘ruin’ has only this technical 
sense for us. 

Let the Player’s chance to win an individual turn be », the 
Banker’s chance ¢ = 1—yp, a tie being excluded. Let P be 
the Player's chance to ruin the Banker, Q the chance that the 
Banker will ruin the Player. We must begin by showing 
that the sum of these two is 1, i.e. that there is no finite 
probability that the game will continue indefinitely. The 


* The greater part of the present section will be found in an article by the 
author, ‘The Gambler’s Ruin’, Annals of Mathematics, vol. x, Series 2, 1909. 


GAMES OF CHANCE 53 


proof is immediate when we remember Bernoulli’s theorem, 
for the only way to avoid the ruin of the one or the other 
party is for the discrepancy to remain below a certain fixed 
limit, and we have seen that the chance for this decreases 
indefinitely towards 0. If each turn be fair, in the sense 
defined above, i.e. if the Player’s expectation be 0 each time, 
we see by Ch. II, 4] that the Player’s total expectation is 0, 
hence PB—QA = 0, Pa) "1: 
P=A/(A+B), Q=B/A+B). (18) 
From these equations we draw two important conclusions. 
First, the Player’s chance is independent of the amount 
staked, or the game played, provided, of course, there is no 
danger that the game will not be finished for lack of time. 
The wisdom in any player’s setting a low figure for B is 
evident, Second, who is the Player, from the Bankev’s point 
of view? Suppose that the Banker is running .a public 
gaming resort, and that the game is one of pure chance, no 
question of skill coming in. The Banker is supposed to be 
ready to play against all comers. In most cases, they will 
take opposite sides, more or less, so that his real adversary 
is the surplus of those who back one chance over those who 
back the opposite chance, but there is always the possibility 
of a large combination of players taking the same side, in 
which case the unfortunate Banker would be oppused to an 
adversary of quasi infinite fortune and his ruin would be 
certain. In consequence of this, all forms of public gambling 
are somewhat favourable to the Banker, and our next step 
must be to study the chances in games of that sort. We do 
this by adopting an ingenious device, due to De Moivre, 
which consists in making the game fair once more by assign- 
ing fictitious values to the coins used.* 


Let us assume that the Player has A individual coins 


marked a, a},...a4—1, while the Banker’s coins are marked 


a4, ait? gt heal 


It is agreed that at each turn the 
Player shall stake his a coins of highest mark, while the 


Banker will reply by staking his 6 lowest marked coins. 


* De Moivre, Doctrine of Chances, London, 1756, p. 52. 


54 BERNOULLIS THEOREM 


In this fictitious system the Player’s expectation before 
a turn is 


Pp (at+4 an qrtutl cae qztatb-1) —q (a®+ qttl ae we) 


= «(| an as ; a =) 
, ! a—l 1 a—l1. 


at 
— is [patth_aH4q]. 


Tho game being unfavourable to the Player, we havo 
pb—qu < 0. 
Consider the function 
f(a) = pail al +9, 
f’ (x) = a [(w +b) pa —a]. 
We inake a short table of values 


Hy == (). f (2) = 4, f (x) =O 

eae}, T(t) 90, J (a) = ph-qe < 0, 
ey enon: 

o=| or | Sit; tC ee Seige +q< 0, f (2) = 9, 
Cire J (x) =, FiOS. 


Since we have noted all the real roots of f’ (#) and between 
each two roots of f there is one of f’, we have just one real 
root of f which is > 1. Call this a. 


patt>_a%@+g=0a>1 (19) 

por+qa*=1., (20) 

If a be given this value, we sce that the expectation of the 
Player, as calculated above, is 0 in this fictitious measure. 
Hence, the Player's chance is the ratio of bis fortune, fictitiously 


caleulated, to the sum of the two fortunes in the same 


measure, namely 
a+alt+—...¢a4-! oA] (21) 
toto. at tho!  gtttl 


Suppose, next, that the Player determines to reduce his 
stake to a/o for just one turn, the Banker simultaneously 
reducing to b/c; after that, all is to go on as before. Has his 


GAMES OF CIANCE 55 


chance been improved or injured? That chance, under the 
present hypothesis, is 


/ ay / =U) 

ge EAL) (at suche ea W at[pa/’+qa eal ae 
————————_____ Pe aaa aes ae gor aare ae 
qgtAtB_y gi thy qitB_4 

Now since fee ave a. 


f (a}/”) =< (0): part? +g? <a 
the chance is less than that given by (21). 

The conclusion to be drawn from all this is moral, highly 
moral! It is unwise for the Player to reduce his first stake, 
it would be similarly unwise for him to reduce any subsequent 
stake. The series of turns is bound to run till the one party 
or the other is ruined, hence we have the 
Fundamental theorem of games of chance] Zhe Player's best 

chance of winning w stated sum at an unfavourable 
game vs to stuke the swum which will bring that return 
in one turn. If that be not allowed, he should stake at 
each turn the largest umount that the Banker will 
accept, 

The practical gambler (if there be such a person) will 
probably reply to this: 

‘The player who stakes his whole fortune on a single turn 
is a fool, and the science of mathematics cannot prove him to 
be anything else.’ 

The answer is immediate : 

‘The science of mathematics never attempts the impossible, 
it merely shows that other players are greater fools.’ 

Let us look into certain special cases of (21). The Banker's 
chance is (a4 +84) /(q4+8_), (22) 


When A and B are equal, i.e. when the Player undertakes 
to win or lose a certain sum, his chance is 


1/(a“ +1). (23) 
Putting a = 1+€ we sce that his chance is less than 
1/(2+ Ae). (24) 


When « = 6b we may take each of them equal to 1, we have 
then a= 4/p. (25) 


56 BERNOULLI'S THEOREM 


As an example of these principles let us examine the game 
of roulette, as played at Monte Carlo. Our description is 
taken from Sir Hiram Maxim.* 

‘The roulette consists of a large circular basin, about 2 feet 
in diameter, with the outer rim turned inward. The bottom 
of the basin, which forms the wheel, is of metal, quite separate 
from the rim or sides, and is nicely balanced on a fine pivot, 
so that when sct in motion, it will spin for a considerable 
time. The outer edge of the wheel is accurately divided into 
thirty-seven sections or pockets, cightcen of which are painted 
_ red and eighteen black. One is called zero and is neutral in 
colour. The other pockets are numbered from 1 to 36,’ 

The wheel is set in motion, and a small ball started rolling 
around the edge in the opposite direction. The game consists 
in betting on the colour or number of the division in which the 
ball comes to rest. There are fourteen different methods of 
staking; the simplest are, red or black, odd or even, above 
or below 18. In each of these cases, the Player and Banker 
put up equal sums. Ifa player stake upon a single number, 
the Banker puts up 35 times the amount. The upper limit 
for a stake on a simple chance, as red or black, is, or was, 
6,000 francs, whereas on a single number the limit was 180 
francs. When the ball rolls into the zero, the player on 
a simple chance may either forfeit one half of his stake, or 
leave it ‘en prison’ till the next turn. If he be fortunate 
in this turn, he saves his stake, but gets nothing from the 
Banker ; if he lose, the stake is gone for ever. ‘Those who bet 
on individual numbers or combinations, lose all when the zero 
appears. 

Let us calculate the probability favourable to the Player. 
We shall imagine that he is wise enough (and rich enough) to 
stake 6,000 franes each time. We may also disregard the 
possibility of zero appearing twice in succession, as this will 
certainly be very rare. Then in sets of 74 turns cach, the 
average result will be 36 reds, 36 blacks, 1 zero followed by 
a gain, which does not count, and 1 zero followed by a loss. 


a tt Nama eR re es ie —s ee eS A 
Hence P=F$5 7 = 435 «= 9/p = 3%. 


* Monte Carlo Facts and Fancies, London, 1906, pp. 257 


GAMES OF CHANCE OM, 


The Player's chance is 
((38)4 —1)/((33)* +7 —1) 
where A and B are the two fortunes, reckoned in terms of 
6,000 franes as a unit. When they are equal, his chance is 
1/((3)* + 1). 

Let us next suppose that the Player stakes on a number. 
Has he a better o1 worse chance? Common sense points to 
the latter, as the zero is now a sure loss, and this anticipation 
is borne out by calculation. The amount which may be 
staked is one thirty-third of what it was before, which 
amounts to assuming that the stake remains, but that the 
fortunes have been multiplied by 33. The Player’s chance 
will be what it was previously, if the present value of a be the 
thirty-third root of (32), and will be less, if it be larger than 
that. The thirty-third root of (3%) is 1-00085 ; to find a we put 


— me | ae GS 
Gil ere, P=37 IES BS 


(1+6¢)*°— 37 (1+€) +36 = 0, 
1+ 36€+630€74+7140¢€°—37—37€4 36 = 0, 
1+¢=1-00155. 
This method is less favourable to the Player than was the 
simple chance. 

Let us now look at matters from the Bankev’s point of 
view. We have not the data available to do this really 
correctly, but the method of handling a supposititious case 
will show how a correct solution might be obtained. Let us 
consider a set of runs, each of 1,200 turns, and inquire into the 
chance that the Banker should come out the loser in one run. 
Let us, for simplicity, assume that there are 200 players, each 
staking an average sum which we take as the unit. As 
a matter of observation, not all players follow the same 
system. Some bet on red because they believe it is ‘ Red’s 
day’, others for precisely the same reason bet on black, as 
they think it is time for black to appear to even things up. 
Some prospective players will sit for a long time observing 
the runs, and not betting at all, until they have made up 
their minds what is happening, or going to happen. These 
patient watchers are quite as welcome as the rasher players, 


58 BERNOULLIS THEOREM 


and quite as unlucky. However, owing to the variety of 
motives, we shall come near enough to the truth if we assume 
that the 200 players are divided by lot into those who back 
red, and those who back black. The game, therefore, amounts 
to this. When the zero appears, the Banker gets one half of 
the stakes of all the players. When zero dves not turn up, 
the reds pay the blacks, or the blacks pay the reds, and the 
Banker receives or makes good the difference. When a coin 
is thrown 200 times the average numerical discrepancy, as 
given by (16), is 0-798 /200 xt = 6, so that we may assume 
that, on the average, the reds and blacks will offset each 
other, except for twelve players, with whom the Banker must 
reckon. 

In 1,100 turns there will be, on an average, 30 zeros... 
When a zero turns up, the Banker will collect a half unit 
from cach player, the average winnings from the zeros will be 


1 x 200 x 30 = 3,000. 


To come out behind in the run of 1,100 turns, the Banker 
must have an adverse discrepancy of 4x 3,000 = 500 turns. 
The chance for this is 

500 


Set (ee ———.) = 1_ 10 (22 
oe: Cree a ba 


which is so small as to be utterly negligible. 

There is one more problem in ruin which is worth notice. 
Assuming that A = mu, what is the probability that the 
Player will be ruined exactly on the wth turn? This is the 
probability, that in the first ~—1 turns he will win exactly 
(u—m)/2 times, and lose exactly (u+m-—2)/2 times, and 
that he thereupon loses the wth turn. This will be * 


(u—1)! ' yy —m)/2 qrenmns 
(EEG p(t ee v x 
2 ; 2 


* Incorrectly given by Bertrand, loc. cit., p. 123, 


Problem. 
Work out the theory of some other game according to these same 
principles. 


GAMES OF CHANCE 59 


This expression is only correct if we assume that the 
Banker is so rich that there is no possibility of his being 
ruined in the interval. The sum to infinity of expressions 
like this would be the probability of ruin for a player pitted 
against an adversary of infinite fortune, but that probability 
we have already seen is 1. Let us rather seek for what value 
of » this will be a maximum. It is to be noted that we have 
here a term of the expression g(p+q)H7'. 

We get a similar expression by changing p to w+ 2, and 
equating the two we get the rather clumsy quadratic equation 

4u(u+1) 
(w+ 2—m) (u+m) 
A root of this equation will give approximately the term 
desired. 


xX pg = Ns 


CHAPTER IV 
MEAN VALUE AND DISPERSION 


§ 1. Elementary Theorems in Mean Value. 


In the course of the last chapter we had frequent occasion 
to solve such problems as to find the expectation of a man 
who is to receive a sum equal to the square of the discrepancy 
in a certain series of trials. Ch. III, 1] showed us how an 
expectation is the limit of an average, as the number of trials 
increases indefinitely. The reader must have suspected that 
this whole question of averages and expectation was capable 
of much fuller treatment, and that new definitions would 
help to clarify the whole matter. We now proceed to give 
our undivided attention to this task. 


Definition] If a variable take the different vulues V, V,... Vn with 
the respective probabilities ip, py... Pn, and these are all 
the possible values for thut variable, then the expression 

> vil; 
i=l 
is called the mean value of that variuble. 

Definition] If Vis a function of the parameters X,,X,,...Xn, 
which vary according to the third empirical asswmp- 
tion, then the integral 

hee I VFdX,dX,...dX, 


extended over the whole range of variation, giving to 
the probability a value other than 0, shall be defined as 
the mean value of the variable. 


We reach at once from Ch. III, 1] 

Theorem 1] The mean value of a variable is the limit of its 
average vulue as the nunrber of trials increases indefi- 
nitely. 

Let us note in passing that the mean value of a variable 


ELEMENTARY THEOREMS IN MEAN VALUE 61 


is the expectation of a man who will receive a sum equal to 
this variable. 

In speaking of variables ‘throughout the present chapter, 
we shall mean such variables as take mean values. We must 
give one important definition in connexion with these: 


Definition] Two variables shall be suid to be independent if 
the pr obability that one lie close to a given vulue is 
independent of the value of the other. 


Theorem 2] The mean value of the sum of two variables is the 
sum of their mean values. 


Let us first suppose that each can take only a finite number 
of values. Let the first one take the values #,2,...%, with 
the respective probabilities »,,... 7, while the second takes 
the values ¥,%/.... Y, With the probabilities m,7,... 7. Let 
F;, be the probability that the first variable takes the value 
x;, while the second takes the value y;. The mean value of 
the sum will then be 


The total coetticient of y; is >, L;;. This is the sum of the 
i=1 

mutually exclusive probabilities that the first variable should 

take the values 7,2, ...%,, while the second takes the value 

y;- It is therefore 7;. In the same way the total coefficient 


ee : 
of x; is p; The expression above is thus 


= Z= mM 

2 Pitt D my; 

é=1 j=l 
and this is the sum of the two mean values. When one or 
the other variable can take an infinite number of values we 
pass from a finite sum to a definite integral by the sort of 
device universally used in the integral calculus. 

It is especially important to note that in this theorem there 

is no assumption as to the independence of the variables. In 
consequence of this, we can use mean values in cases where 


62 MEAN VALUE AND DISPERSION 


the search for the actual probabilities is beyond our power. 
The next theorem has a more restricted scope : 


Theorem 3] The mean value of the product of two indepen- 
dent variables is the product of their mean values. 
Using the same notation as before, since F;, is the probability 
for the simultaneous arrival of the values w; and Yj, we have 
by our definition above 


Fy Pi, te, 
tt Perea v7 
P. Dim, 


Bue 2H= 2y= B ha, 


Mean value is 
=n f= 1 i=? 


ome =(S rm) (Sma): 
6 pal 


The extension to the case where the one or the other 
variable can take an infinite number of values is immediate. 


Theorem 4] The mean vulue of the square of a variable is not 
less than the square of its mean value. 


Using our previous notation, we have 


Problems. 
1. Prove that the mean value of the sum of k variables is the sum of 
their mean values. 
2. Prove that the mean value of the product of k independent variables 
is the product of their mean values. 


ELEMENTARY THEOREMS IN MEAN VALUE 63 


Theorem 5] The mean value of the square of the sum of w 
independent variables, cach of which has the mean value 
0, es the swm of the mean vulues of their squares. 


We see, in fact, that when we square our sum, we have 
squared terms and product terms, and the mean value of each 
of these latter is 0 by 3]. Let us go a little further in this 
direction. Still ‘assuming that the mean value of each of our 
variables x, ... Z, is 0, let the mean value of the squares be 
A,A,... An respectively. We may write 


2 
JSR 
De; 
eS (2 —1) 1 1 1 2 
ti =) L) — — @,— ~ U3... — — hy 
n N ib 7D n 


and reach: 


Theorem 6] Given n independent variables a, 7,...an, each 
with the mean value 0, while the mean values of their 
respective squares are A,A,...An, then the mean value 
of the expression 


Zs) 


Let us next look at variables whose mean values are not 0. 
Using the same notation as before, let us assume that the 
a mean values are d,d,...@,. Since 


(2;—4,)* = 0? — 24,2; +4? 
we have: 
Theorem 7] Jf the mean value of a variable x; be a;, while the 
mean vilue of its square is A,, then the mean value of 
(x;—a,)? 
A,—«a,. 
Theorem 8] Given n independent variables x,x,...@, whose 
mean values are U,U,...dn, while the mean values of 


their squaresare A, A,...A,, respectively, then the mean 
vulue of 


as 


64 MEAN VALUE AND DISPERSION 


> (aj- 4)? 
i=l 
18 > (A;—<a,?). 
Gi 
In the further discussion of these quantities, let us assume 
that each can take but a finite number of values. For 
instance, let 2, take the values @,,, 2,...%;n With the re- 
spective probabilities p,,, p15, ..-Pjn- Our theorem 8] may 
be expressed by the equation 


> (14+ Wy + 5), + ony = Oy Uy — Gg — «6)? Dig Pag Day ve 


bth Bess 


i=2 
= > (A;—a,?), 
ee | 
Dea 1? (ty g + Wj + yy +... — Ay — by— hy — 004)” yg Dog Dak ++ 
ayliobn 


n X(A;—a,”) 


On the left, let us leave out all terms where 
t? (ay g +29 ¢ + Xqz,+...—G,—A,—A...)” at 
n (Aya) i 


and replace this expression by 1 when it is greater than that. 
We have thus a quantity distinctly less than ¢?/n which 
represents the probability that this expression should be 
greater than unity. Taking the contrary probability we have: 
Tchebycheff’s inequality *] Given n independent variables 
LN ...Ln Whose mean values wre Ay A,...dn, while the 
meun values of their squares are A, A,... A, respectively, 
then the probability that the difference between the 
average of these quantities, and the mean value of this 
average, which is the average of their mean values, shall 
differ from 0 by a quantity numerically not greater 


eae 2 
than al pe, és greater than 1— . 
P , 


* Tchebycheff, Guvres, Petrograd, 1899, vol. i, p. 687. 


ELEMENTARY THEOREMS IN MEAN VALUE 65 


In applying this inequality, we note that the expression 
Y 2 (A;—4;?) 
n 


will vary with » only between fixed limits ; we may therefore 
take ¢ so large that the expression 
« 1 aA c;) 
GN n 


is as small as we please. Then we may take 1 so large that 
the probability is as near to unity as we wish. In other 
words, the inequality tells us that by taking enough variables, 
we are almost certain that the difference between the averaye 
and the mean value of this average, which is the average of 
the mean values, shall be extremely small. The simplest case 
is where all the mean values are the same, and the inequality 
tells us that there is a very large probability that the difference 
between the observed average and the mean value shall be 
very small, which is, after al], a restatement of Ch. III 1]. It 
also leads to: 

Poisson’s Law of Large Numbers*] If an event be tried 
repeatedly with the probabilities p,p.... for success, 
which may be constant, or may vary with each trial, 
then if the number of trials increase indefinitely, the 
probability that the difference between the average prob- 
ability and the observed ratio of success will differ by 
less than any assigned quantity approaches 1 as a lamit. 

Let each variable take the value 1 when the event succeeds, 

0 when it fails. The mean value of the ith variable is thus 

p,;- Tehebychett’s inequality tells us that we have a probability 

above 1—1?/n that 


1 Sp, 2 Pa 
t wv nv 

23 Ae PEN 

= ae 

t 0 
4 Z 
Pit =1, (P1-4) > 9% PIT <B 

* Poisson, ‘Recherches sur la probabilité des jugements’, Comptes Rendus de 


V Academie des Sciences, vol. ii, 1835, Bertrand (loc. cit., p. xxxii) comments as 
2686 F 


> Vi 
average number of successes — —— ) < 
% 


66 MEAN VALUE AND DISPERSION 


Hence we have a probability > 1—¢?/n that 
(average number of successes —- 3 p,;/n) < 1/2t. 


No matter how small 1/2t may be, we may make ??/n as 
small as we please. 


§ 2. Dispersion.* 


Suppose that we have 7 measurements of the same object, 
or different objects y,7/, ... ¥,, Where 


i=n 


De 
i=l 


Then the expression i= 


» (9;— 


j=l 
—___——_—__—— 1 
. (1) 
is called their dispersion or standard deviation. Let us find 
its mean value. We shall use the previous notation for the 


mean value of one of our variables and for the mean value 
of its square, and write also 


n 
The square of the dispersion is 


nm > Wy- yy = — 55 {(y;-4) (y—a)) +[a;—a]}?. 
"¢=1 oil 
The mean value of each large round bracket is 0, as is the 
mean value of the product of a large round bracket and 
a square bracket. The mean value of the square of a square 
bracket is its ostensible value. When it comes to finding 


follows: ‘Tel est le résumé fait par Poisson lui-méme d’une découverte qui 
se distingue bien peu des lois connues du hasard, et a laquelle il a, & peu 
pres seul, je crois, attaché une grande importance.’ 

* The first part of the present section will be found in an article by the 


author, ‘On the Dispersion of Observations’, Bulletin American Math. Soc., 
vol, xxvii, 1921. 


DISPEKSION 67 


the mean value of the square of the large round bracket, we 
may apply 6] and 7]. This brings us to the 


Fundamental Dispersion Theorem] Jf n independent quanti- 
ties be given Y1 Yo... Yn, Whose mean values UTE Hy Uy. - 
while the mean values of their squares are A, Ay... An 
respectively, and vf the uveruge of the aiaebies ie Y, 
while the average of the mean values is a, then the mean 
value of the square of the dispersion is 


Ti Se 
=| > (- E (4;—a7) + (4 —a)") } 
yep) b 


In practice we make two approximations. Firstly, when 
is reasonably large we replace (~—1)/n by 1; secondly, in 
accordance with Tchebycheff’s inequality, since the square of 
the dispersion is an average, we replace its mean value by the 
observed value, thus getting the fundamental dispersion 
equation 


Dea 1=n 


pe Uw =; [ BAs-ary+ Ses—a'] 2) 

The reader will not forget that this equation is merely an 

approximation. No equation connecting observed quantities 

with mear values can be exact. Let us make some applica- 

tions of this. Suppose that we have WN sets, each of s 
observations 


Hy, X19 +0 X95 yi =, 


LE Pa yre De ys = Ly; 


Let a,; be the mean value of w,;, while A,; is the mean 
value of its square. 
F 2 


68 MEAN VALUE AND DISPERSION 


j= 2 j= 8 eet 
Ane a as 2 
Gj — 2 v is 
Z (ey F) = B (4-049 + B (ay - Z) 
Gen j=l j= 
Summing again: 


GU TE t= Ni, j= i=N,j=s a 2 
> (« i = ty = (A; —a,,7) + (4; - =) 
ijl i,j=l ij=l 

jzs 
Again ay fama te > (X53 — 4). 
j=1 > 
Hence by 5] and 8] the mean value of (x;—a,)? is 
jas 
> (Ajj;—4,;). 
pe 


Mean value of a,? = mean value of (x;—a,)*+a,? since a; 
is mean value of @;. 
Applying (2) again: 


i= WN t= Nj =e Vs WN: 
>a-2%= > (A;,—a,,") + > (a;-a). 
ve i,j=1 a1 


Eliminating (4;;—«,,?) between this equation and the last 
one which contained it, we have 


t= Nj =s8 


2. le a 2) - (j- *) i 


= 3 ei-2)°(a,-ay).—@) 
iar 
In practice we recognize three types of groups of observations. 
A. Bernoulli series. All of the observations are supposed 
to bear upon the same quantity, or, at least, the mean values 
are all the same. The differences of observed values would 
thus be purely accidental. Here 
Opes a,/8, ae = 2 
win WN, J so 8 
pe (« Cs — ee S (x; —«)?*. 
i,j=1 
Such a series is said to have oan dorton 


Sssssosessss Aes 
SZSRRLESSCS Year. 


DISPERSION 69 


B. Lexis series. All observations in the same set are 
supposed to be on the same quantity, but the quantity varies 
from one set to another. 


j= i=1 
This series is said to have supernormal dispersion. 


C. Poisson series. Here we suppose that within a set there 
is some difference among the objects, but that all sets are 


comparable ay $a;,/s, as=a, 
SS RAGES Ne Pedy 
> (x = “£) > > (a,—2)°. 
i,j=1 wo 


This series is said to have subnornwul dispersion. 
What we can do in practice is this. We calculate the two 


quantities ,_y ju, i= N 


a ee c= 


If they be virtually equal, we are sure that the members of 
one set cannot all be the same, unless all the sets are the 
same, and vice versa. If the first be less than the second, 
the different sets cannot be all the same. If the first be 
greater than the second, there must be a variation within 
a set. 

As an example, we give the observations for precipitation 
in inches by month in New York City— 


Ee a i Aco ea arts, WLR Se Se Ba 
a eee Pat em ee ee ae Se ke ver. eR 
4-18 516 3.18 2.06 4.08 336 4.33 2.69 236-417 4.26 1.98 41.78 
2-07 0.86 518 6.82 7-01 0.94 5-41 6.88 233 220 1-31 6.05 47.06 
2.98 5.78 4.82 3.51 1.23 5.91 3.12 3.29 3.59 6.66 1.19 6.19 47-07 
8-44 3.88 3.65 2.88 0.338 7.42 3.33 5-96 260 1:55 0.90 2.81 48.60 
8.88 2.18 3.44 8.94 1.61 2.70 4.31 7:18 3.18 3:21 2.62 3.87 41.57 
8.98 2.79 8.65 2.45 1.12 4.18 6.01 5.28 7-11 2.67 1-67 3-67 44-48 
2.98 2.57 6.58 5-78 4.67 1.70 3.21 368 2.54 4:30 1.28 3.53 41-82 
3.26 1.52 3.80 3.89 4.08 3.29 1.18 2.48 8.00 8.82 5.05 3.91 45.28 
8.84 5.86 215 1.82 9.10 1.70 4.33 5.65 1.60 1.92 0-75 3.21 41-43 
3.88 4.81 3.19 6.98 1.72 3.17 1.98 7-94 2.66 0.74 1.58 5.00 ‘41.55 


3-46 


70 MEAN VALUE AND DISPERSION 


i=10 


2 (a;—2)? = 69- 47. 


This has the eens of a Poisson series, and we 
conclude that the rainfall in New York shows a greater 
tendency to vary month by month than year. by year, 
a rather natural result. 

The most frequent applications of these tests are to the 
observations of probabilities or frequency ratios, to see whether 
they vary from case to case or from set to set.* Let the 
generic letter for one of our probabilities be p;; and let this 
represent the probability that 2;. takes the value 1, while in 
the contrary case it takes the value 0. Then a;/s is the 
average probability for the 2th set, and we may put 

j=s i=N 
8p; = 4; = > pi; Nip = = 
j=1 to 
Aja; = = mean value of (a5; — Djs)? = pij ad 
By the equation preceding (3) 


iamuly i=NV=s ee 
> @;-7?= > (De — Dy") +8 > (p;-p)*, 
y=1 i,j =1 vw 
Gis’ j=s J ee pee 
> [py — pi) = > Lies spe; > Pi = 2 [Pj — p;\* + ap;*. 
Yin j=l j=l eaal 
=e ef 
Similarly > v2 = D> (pi: —p)?+ Np? 
ae | ie] 
x (x;— x)? -> [me ~2 (Dep Dy) 8: (v:-»)*| 
ja N jes tee FN, 
= Nsp—Nsyt- > (Pi; — Pi)" + (8° 8) > (pi- py 
j=l i=1 


* See Fisher, loc. cit., pp. 117 ff., also Forsyth, ‘Simple Derivation for the 
Formulas for the dispersion of Statistical Series’, American Math. Monthly, 
vol, xxxi, 1924, 


DISPERSION 71 


t=N,G=8 j 
1 6 1 ~ 
nee (x;—@)* = spq — WV Ss (Pgs Pi) zp} (Pi- P- 
Cope in Lien 
6 = NV 
Bernoulli series: p;; = pj; = p; NG Ss L;—x)? = spq. 
i | 
Lexis series: , py, = Piy,Pe FD; 


i= N 


N sey (x; — a)? 


II 


ge ee 
LR > (p;—p)”. 


i=1 ied 
Poisson series : Dg Pepe = Pp; 
aN’ yi hiss 
aD (a;- x)? = spq — : > (py—p)* 
S71 ey ae | 


q There are cases where a study of the mean value of the 
squared dispersion or discrepancy brings out the differences 
between two series of trials which are otherwise seemingly 
alike. Let us return to our problem of repeated trials, so 
thoroughly discussed in the last chapter. Let us first have 
n, trials, with a constant probability p, of success, then n, 
trials, with a probability »,, &e. The mean value of the 
number of successes will be En, p;. 

tat 


& 
The total discrepancy will be 
(7 — Pj) + (72 — Ny Po) + 
Since the mean value of the product of any two of the braeck- 
eted expressions is 0, the mean value of the square of this 
discrepancy is the sum of the inean values of the squares of the 
diserepancies of the individual series, Le, Yiujp;— Buj,p," 
c (7 v a 
Suppose, secondly, we take n = 21; trials of an event, 
t 

where the probability for success is 


XN: P 
UD r= - ’ 
it En, 
i 


Problem. 
Work out another set of observations according to this same plan, 


72 MEAN VALUE AND DISPERSION 


The mean value for the number of successes will be as 
before. The mean value for the square of the discrepancy is 
Sn, Sngps—(E NPs) 


mp (1p) = or 
4 


In; Nj; pP— (Ay pj)? 
=2 pi > Oo 


a; 
. A ngM; (Pi =P; 
= 2g Pji— ANY; apes 250, 


We see that in the second case the mean value for the 
number of successes is the same, but that for the squared 
discrepancy is greater. 

q Here is an even more instructive example of the same 
kind.* The problem of repeated trials may be stated in the 
following way. An urn contains a large number WN of balls, 
of which Vp are white and Nq are black. A ball is taken 
out and replaced n times in succession, what is the probability 
for seeing just r white and n—vr black balls? This problem 
we have solved completely. We now take up the analogous 
problem where the ball extracted is not replaced. The 
probability for just 7 whites and »—,r blacks is now 

(Np) (Nq)! a ene 
ri (Np—r)!  (w—r)![Nq—(w—r)]! ~ nt (V—n)t- 
This is a maximum with 
1 1 
r!(Np—r)! (n—r)! [Vq—(n—r)]! 

The ratio of this to the next term is 

r+ N (1—p)-1—(n—7r) 

Np-v wT ; 


which is very close to 


? Nq—(n—r) 
and this will be 1 when v= n». 
The most likely number of white balls will be as before. 


* Castelnuovo, loc. cit., p. 41, 


DISPERSION 73 


Let us find the mean number of white balls. This is the 
expectation of a man who shall receive one pistole for every 
white ball that appears, and nothing for a black one, and 
this is the sum of his expectations from the individual balls 
drawn, and so is p+p+p+t...= np, and this is just the mean 
value for the number of success that we got before. Now let 
us find the mean value for the square of the discrepancy. 
X; take the value 1 if the 2th ball be white, 0 if it be black. 
Then the mean value of ¥X; is the mean value just found. 


Furthermore, let Y;= X;—p. We wish to find the mean 
value of (X Y,)*. 
We have the following table of values: 


P= a probability p 


et q 

Y2=¢ P 

Y? = p* q 
© Ni | 
Y;Y,=¢° P ya 

“ Np-l 

i =e = 
y=) TN 4 
gig aoe Np—Nq _ 2Npq. 
ae) 


4N(N-1) N-1 
Hence the mean value of > Weis 


; n (n—1) pq 
Oa a pee aa 
Comparing this experiment with that where the balls are 
not replaced, we see that the most likely number of white 
balls, and the mean number, are the same, but the mean 
value of the square of the discrepancy is decreased, and we 
should expect to see less dispersion. 


CHAPTER V 
GEOMETRICAL PROBABILITY 


In the third empirical assumption of the first chapter, we 
assumed that when an event depended upon 1 independent 
variables, varying in an “n-dimensional continuum, there 
existed such an analytic function F that the probability that 
the variables should take values lying in an n-dimensional 
sub-manifold was expressed by the integral 


{ | 7 i FaGax aXe 
Rk 


extended over that manifold. By a proper change of variables 
we then saw that this probability might be expressed by 


the ratio , 
|| | dx, dx, ... dit, 
: - (1) 


|| vue | dnydary... dy 
t 


ve 


where the integration in the numerator is over the correspond- 
ing sub-manifold for the new variables, while that in the 
denominator is over the total field of possible variation. 

The great difficulty in handling problems in this continuous 
or geometrical probability consists in determining which 
variables to take in order to express the probability in the 
form (1), This difficulty can be brought out most clearly by 
one or two specific examples. Here is a variation on one 
that appeared in the first chapter. Suppose that a number is 
chosen at random between 1 and 3, what is the probability 
that it lies between 1 and 2? The natural mode of procedure 
is as follows. All regions of the interval being supposed 
equally plausible, we take the number itself as the independent 
variable, which amounts to assuming that the probability 
that it lies within an interval is proportional to the length 


GEOMETRICAL PROBABILITY 75 


of that interval. Thus, for our particular problem, the 
probability sought is 
ie 


ses 
| ae : 
1 


Leaving this answer for a moment, let us next assume that 
a number is chosen at random between 1 and 4, what is the 
probability that it lies between 1 and 4? Following the 
same reasoning as before, we have 


But we must now notice that if a number lies between 1 
and 4, its reciprocal lies between 1 and 3, whereas if it lie 
between 1 and 3, its reciprocal lies between 1 and 2, and 
the question arises, have we not found two incompatible 
answers to the same problem ? 

A neater paradox of the same sort is due to Bertrand.* 


Example 1] A chord is drawn at random across a circle: what 
is the probability that it is at least as long as a radius? 


First reasoning. The direction of the chord is obviously 
immaterial, as the circle lies symmetrically about the centre. 
All depends upon the distance of the chord from the centre of 
the circle. As we have nothing to guide us here, we assume 
that all such distances, not greater than a radius, are equally 
likely. The chord will be as large as a radius if this distance 


be S$ we, Our probability is, then, ue = 0-866+4+. 


Second reasoning. The position of the first intersection 
with the circle is immaterial, owing to this same symmetry, 
all depends upon the second intersection. All positions for 
this second intersection being equally likely, all angles between 
the chord and the radius are equally likely. The chord will 


* Bertrand, loc. cit., p. 4. 


76 GEOMETRICAL PROBABILITY 


be as great as a radius if this angle be not over 60°, and the 
probability is 2 = 0-6664. 

Which answer is right? Neither, in an absolute sense. 
It would be easy to try the matter out experimentally in 
such a way that the frequency approached the one or the 
other as a limit. If a disk were cut out of cardboard, and 
were thrown at random on a table ruled with parallel lines 
a diameter apart, then one and only one of these lines would 
cross the disk. All distances from the centre would be 
equally likely, and we should have a ratio approaching the 
first answer. On the other hand, if the disk were held by 
a pivot through a point on its edge, which point lay upon 
a certain straight line, and were then spun with a random 
velocity about the pivot, the frequency ratio would approach 
the second value. The best that we can ever do in almost 
any case is to make the best guess as to the proper independent 
variable which our common sense can suggest, and calculate 
a tentative answer therefrom. 

q It is fair to say in this connexion that there are exceptional 
problems where the answer is independent of the choice of 
the independent variables. The following one is due to the 
genius of Poincaré.* A wheel turning freely about a fixed 
horizontal axis is divided into a large even number of equal 
divisions, painted alternately red and black. The wheel is 
set spinning. What is the probability that when it comes to 
rest, a fixed point near the periphery wil] be opposite a red 
sector? The result, red or black, will depend upon the total 
angle @ of spin after a marked point on the wheel has passed 
the fixed point for the first time. Let f (6) d@ be the probability 
that this angle shall be in the interval 6+4¢6. This function 
is strictly unknown, but we may assume that it is continuous, 
with a continuous first derivative, and that the value of this 
latter is alwys numerically S M. We take 6 as an abscissa, 
and plot the curve y = f (6). 

An infinite value of @ being impossible, let us suppose that 
the whole region of variation for 6 runs from 0 to ne =, 
where ¢ is the size of one angular division of the wheel. Let 


* Poincaré, loc. cit., p. 127. 


GEOMETRICAL PROBABILITY 77 


us show that very nearly one half the area under the curve is 
under those regions, shaded in the accompanying figure, which 
correspond to the red sectors. If M, and M, be the maximum 
and minimum values for f at points of two adjacent divisions, 
then the difference in area between the two cannot exceed 

> 2¢ (M,—M,). 

We next note that, by the law of the mean, M,—M, is equal 
to the difference between the corresponding abscissas, multi- 
plied by the numerical value of the slope of the tangent at 
some intermediate point, i.e. (M,—M,) S 2«M. 

The difference between succeeding areas is thus <4e?M, and 


Fie. 1. 


the total difference between shaded and unshaded areas, i. e. 
the total difference between the probability for ending 
opposite a red or a black sector, is 


NW § 
<S > * 4e°M 


= 21M. 

As ¢ decreases indefinitely, / is constant, as is M, our theorem 
is thus proved. 

We have thus seen that the answers given to problems in 
geometrical probability are subject to considerable suspicion, 
still it is certainly true that there are quite a number of cases 
where the choice of the independent variable is clearly dictated 
by the circumstances, and where, as a matter of fact, the 
results are found to check up well in practice. Such problems 


78 GEOMETRICAL PROBABILITY 


are also valuable as exercises in the integral calculus, and 
are, consequently, popular in text-books upon that subject. 
We shall give a number of the most entertaining.* 


Example 2] A line of given length is divided into three parts: 
what is the probubilily that these can be put together to 
form ua triangle? 

Let w be the abscissa of the point which is marked first, 2’ 
of that which is marked second. If the point marked first be 
to the left of that marked second, of which the probability is 4, 
its abscissa must lie between 0 and //2, where J is the length of 
the line. The abscissa of the second point must then lie 
between w and 1/2+a. The probability for this is 


l e+d/: 
elie dx : “ay del 
Phe es a as 


There is an equal probability when the point marked first 
is to the right of the other, hence the total probability is 4. 

Here is another solution which is simple and amusing. Let 
the length of the line be 1, and the lengths of the parts be 
x, y, and z, = 

e+y+z2=1, 
YtZ>@, 24+U>y, U+Y > Z. 

We may take a, y, and z as the distances of a point from 
the sides of an equilateral triangle whose sides have the 
lengths 2// 3, the point being within the triangle. The three 
inequalities will prevent the point from being further from 
any one side than one half the length of the median thereon. 
It must therefore lie within the similar triangle whose ver- 
tices are the middle points of the given sides, and this smaller 
triangle has one-fourth of the area of the larger one. 


* The best collection, from which the following problems are taken, is 
Czuber’s Geometrische Wahrschetnlichkeiten, Leipzig, 1884. 


Problems. 

1, Three lengths are taken at random, not greater than three given 
lengths a, b, and c: what is the probability that they can be combined to 
form a triangle ? 

2. Two points are taken at random on a line segment of length a: what 

is the probability that their distance shall not exceed b? 


GEOMETRICAL PROBABILITY 79 


4] Example 3] Given the quadratic equation 
a@+2pet+q=0, —PSpSP, -QS5qSQ 
what is the probability that the roots are real ? 

Let us take g and p as abscissa and ordinate of a point in 
the plane. The total region is the rectangle whose four 
corners are the points (+P, +Q), the area being 4PQ. The 
favourable region is not within the parabola p? = q. 


OP (2°) 
(0,P, Q,P 
O, Oo O, oO 
(0-2) es 
(0,-F) cae 9-F) pP2 <Q 
Fie. 2. 


In the first case where P? > Q, the chance for imaginary 
roots is 9 fe 1? a aleyZ0 
770 |,P48 = 3PQ7 3 
Favourable chance is 


ee ee 
3 P 3 
In the second case where P?<Q. the favourable chance is 


1 1 iW a 2 


Wed 
z et ae ie 2 —S na pare 
st apo) Past <3 


Example 4] Z'wo points are taken at random within a circle: 
what vs the probability that the circle through them and 


ile’, 


80 GEOMETRICAL PROBABILITY 


the centre of the given circle does not go outside of the 
latter ? 
The radius of the circle being unity, the probability that 
a given point shall lie at a distance from the centre between 
x2 and «+dz is not da/1, as one might hastily assume, but is 
the ratio of the area of the ring containing all such points to 
the area of the circle, namely 
(1/m)2nadxu = 2adza. 
If the circle through P, Q, and O, the given centre, do not 
go outside the given circle, the point Q must lie within one 
of the two circles of radius, 4 passing through O and P. The 
distance between the ce - these is 
1 oe 


pet te sy y. 
4 SUS 


The common chord ee at the centre of each angle 
2sin7!a, the area of the ser a for Q is 


=il 
|2mr—2sin 2(5 y +2 (Z-% a 
= y(n —sin ta +a [1 — x*}*) : 
Hence the probability sought is 


=| (7 —sin-) a] 2@+2°[1—a?]?) dx = os 


We now come to the most famous of all problems in 
geometrical probability. 


Buffon’s needle problem.* 


A smooth table is ruled with parallel lines separated by 
a distance d. A needle whose length is l, less than d, is 
thrown at random on the table. What is the probability 
that it will cross one of the parallels? 


The chance that the distance from the centre of the needle 
to the nearest parallel should lie between the limits x and 
x+dza is 2du /d. 


* Buffon, ‘Essai d’arithmétique morale’. See his Cuvres, ed. 1801 (An. 
VIII), vol. xxi, pp. 168 ff. 


GEOMETRICAL PROBABILITY 81 


The chance that in this case the needle should cross the 


nearest parallel is 9 2 
a 


= COST aa 
v l 


Hence the probability required is 


4 [2 2x 2¢ [ 2 
pe ae Ne Goma ay) at = ee 
| cos Gn = " cos} ydy = (2) 


Another simple and ingenious solution was found by 
Barbier.* The probability of crossing a line is the expecta- 
tion of a man who shall receive one pistole if the needle cross, 
and none if it do not. This is the sum of the expectations 
from the various infinitesimal segments of the needle, and will 
not be altered if the latter be bent in any way. Assuming, 
then, that the needle is made of such inferior steel that it can 
be bent into the form of a circle, of diameter //z, the prob- 
ability that the circle shall cross one of our lines is //md; but 
the expectation is double that, for if it cross once it will cross 
twice. This gives the same answer as before. 

Buffon’s needle problem has induced a number of persons 
to try the experiment of calculating 7 experimentally in this 
way. The most elaborate series of experiments was carried 
out in the year 1901 by Lazzerini,t who made 3,408 trials and 
got the value 7 = 3-1415929, an error of 0-0000003. 

Let us pause for a moment to discuss this result. The 
natural method in such cases is to treat the problem as one in 
relative discrepancy, and find the probability that the latter 
should be within assigned limits. But here the relative 

* Barbier, Liouville’s Journal, Series 2, vol. v, 1860, pp. 273 ff., contains 
a number of interesting problems. 


t+ Lazzerini, ‘Una applicazione del calcolo delle Probabilita’, Periodico di 
Malematico, (2) vol. iv, 1901, pp. 140 ff. 


Problems. 
1. The points Pand Q are taken at random in a cirele. What is the 
probability that the circle with P as centre and radius PQ will lie inside ? 
2. Two points are taken at random within a circle. What is the 
probability that the perpendicular from the centre on their line does 
not pass between them? 
8. Do Buffon’s needle problem when the length of the needle is greater 
than the distance between the parallels. 
2686 G 


82 GEOMETRICAL PROBABILITY 


discrepancy is so small that the discrepancy in the number 
of crossings would be less than unity, and the safest plan 
is to assume that there was no absolute discrepancy at all. 
The probability for that, according to Ch. III (9), is 


1//2 npqr. 
We have no information as to the relative lengths of /. 


and d, but probably shall make no great error in the final 
conclusion if we make the simple assumption 


2 Os me Nore 
The probability for finding no discrepancy will then be 
1 1 


+ = =e 


1 69 
sj eai6(*—) 
tT 


It is much to be feared that in performing this experiment 
Lazzerini ‘ watched his step’. 

Barbier’s method of solving Buffon’s needle problem is 
easily extended to other cases, and gives an easy solution of 
the more difficult problem of finding the probability that 
a line shall cross a closed convex contour or oval. We shall 
imagine that the experiment is carried out in such a way 
that we are justified in taking as independent variables the 
distance of the line from a fixed point, and its angle with 
a fixed direction. If these numbers be » and 6, and if we 
slide the origin to the point ay, and swing through the 
angle ¢, : 

p’ = &, cos (@—¢) +y, sin (6-9) +p, 0’ = 0-4, 


oy)’ oy’ 
“Pp Op . 
Wie ih, 35 = — x, sin (0— d) + y, cos (0—9), 
06’ > 6” 
Tiger tens 
o(0'p') ; 
d(Op) 
Problem. 


Captain Fox (Messenger of Mathematics, vol. ii, pp. 113, 114) made 1,120 
trials of Buffon’s needle problem with the resulting value of 7, 3.1419. 
Discuss this result. 


GEOMETRICAL PROBABILITY 83 


Since this Jacobian is equal to 1, the probabilities for an 
arbitrary line are independent of the point and direction of 
reference for the normal coordinates, an important point 
easily overlooked. The probability that a line shall pass 
between two given points is proportional to the length of their 
segment, and is independent of the direction of the line when 
every line passing between them is permissible. The prob- 
ability that a lino shall cross an oval is one half the expecta- 
tion of a man who shall receive one pistole for each intersection, 
tangency counting double, and this, in turn, is one half the 
sum of the expectations for each linear element. It is, there- 
fore, proportional to the perimeter of the oval. If a line cross 
a certain oval it may, or may not, cross a second oval within 
the first ; it cannot, however, cross the latter without crossing 
the former. The probability of crossing the inside oval is, 
thus, the probability of crossing the first, multiplied by the 
probability that, having crossed the first, it shall also cross 
the second. We thus find the latter probability by dividing the 
probability of crossing the inside oval by that of crossing 
the outside one, and the factor of proportionality cancels out, 
giving us: 


Theorem 1] The probability that a line which crosses a given 
oval shall also cross a second such oval inside the first 
is the ratio of the perimeters of the two. 


In particular, to solve Buffon’s needle problem, we have 
merely to treat the needle as an extremely thin oval, and glue 
it on a circular disk of diameter d. 

§] The probability that a line segment should intersect an 
oval is the expectation of a man who shall receive one pistole 
if the segment meet the oval once or twice. If the segment 
be extremely short, the probability of this latter is negligible 
in comparison with that of the former. The probability 
that a short segment should meet an oval is proportional to 
the product of its length and the probability that its line 
should meet the oval, i.e. proportional to the product of its 
length multiplied by the perimeter of the oval. 


Ga 2 


84 GEOMETRICAL PROBABILITY 


q Theorem 2] The probability that two ovals should intersect 
is proportional to the product of their pervmeters. 

Let us find the probability that a line should intersect two 

mutually exterior ovals. Let them be connected by direct 

and transverse common tangents as shown in Fig. 3. The 


Se 
i 


Fie, 3. 


probability of meeting the outside contour is the probability 
of meeting at least one half of the figure o. This is propor- 
tional to the total perimeter of the o less the probability of 


meeting both parts of the oo, which is the probability of 
meeting both ovals. 


§] Theorem 3] The probubility that a line shall intersect two 
mutually exterior ovals is proportional to the difference 
between the perimeter of the figure » formed by the 
ovals uiul their transverse common tangents, and the 
perimeter of the convex figure formed by the ovals and 
their direct common tangents. 


§] Example 5] Jf a line cross a rectungle of dimensions a and 
b, whut is the probability that it will cross two opposite 
sides ¢ . 


We consider the sides as indefinitely thin ovals, and add the 
probabilities for each pair. 


2 Sut + b?+2u—2(u+D) 2/u*+b?4+ 2b—2 (a+b) 
2(4+b) ae 5 2 (w+b) 
_ 2V7ut +b? 
~ a+b 
Let us calculate this probability again, taking as independent 
variables the positions of the points on the perimeter. The 


GEOMETRICAL PROBABILITY 85 


probability that the first intersection should be on a side «, 
and that the second intersection, which must not be on the 
same side, should be on the opposite one is 


a aos a” 
a+b a+2b- (wtb) (a+26) 
We have an analogous probability when the first inter- 
section is on a side 6, adding the two together we get 


_ &—tab+l? 
solocnt an ~ a + Sab +02? 

and this is somewhat less. 

What will be the probability of passing between two ovals? 
This is clearly the difference between the probabilities of 
meeting the outside contour and that of meeting at least one 
oval, whence, by the theorem of total probability general 
case, we get: 

{Theorem 4] The probability that a line will pass between 
two mutually exterior ovals is proportional to the 
difference between the perimeter of the w and the sum 
of their perimeters. 

Example 6] Two secants are drawn across an oval: what is the 
probability that they will intersect within the curve ? 

Let p be the length of the normal on the first secant, from 
a chosen origin within the oval, let 6 be the angle which 
this perpendicular makes with a fixed direction, 7 the length 


of the chord; the probability that a second secant shall cross 
this chord is dé dp 2l 


Here L is the distance between the two tangents parallel 
to the chord, and s is the perimeter of the curve. Now 


L 
| ldp = Area, 
0 


hence the probability we seek is 


2 Area "dd 
8 0 wh 


We must find this integral by an indirect method. We see, 


86 GEOMETRICAL PROBABILITY 


in fact, that 1/Z is the probability that a secant in the given 
direction, which crosses the given oval, should also cross 
a circle of diameter 1 within the oval, and by 1] 
ee as 
0 awl i $8 
The probability sought is, therefore, 2m Area/s*. 
In the case of a circle this is 3. When the area is given, 
the circumference is a minimum when the oval is a circle. 
We thus reach a rather curious result : 


Theorem 4] The probability that two random secants of an 
oval should intersect is equal to one half when the oval 
as a circle, and less in every other case. 


Problems. 

1. Find the analogues in 3 dimensions to Theorems 1-4, and Examples 
3-6. 

2, A die is thrown on a board ruled with parallel lines whose distance 
is greater than a diagonal of a face of the die, Find the probability that it 
will cross a ruling. 

3. Find the probability that a line shall intersect two ovals with two 
common points. 

4, Find the probability that a line shall intersect two ovals with four 
common points. 


CHAPTER VI 
THE PROBABILITY OF CAUSES 


Tuer form in which we have so far studied problems of prob- 
ability is not always that in which they present themselves 
in practice. We have assumed that we knew just which were 
the equally likely ways in which an event might happen, or 
_ the proper independent variables when the number was 
infinite, and have calculated the probability or frequency ratio 
from them. But it often happens in practice that what we 
know is merely an empirical approximation to the frequency 
ratio from a limited number of cases, and what we wish to 
find out is the likelihood that the actual probability should 
lie within certain assigned limits. To put the matter in con- 
crete form, we saw (p. 50) that Buffon threw a coin 4,040 
times and saw 2,048 heads. What we wish to know is the 
likelihood that that series arose from throwing a good coin. 

We have already learnt one method of meeting the problem, 
namely, to assume that the coin is good, and calculate the 
probability that the discrepancy will be as large as, or larger 
than, that observed. That does not, however, cover the ques- 
tion entirely. It is one thing to say that if a coin be good 
the discrepancy will attain a certain figure a certain propor- 
tion of the time, it is quite another to say that when a certain 
discrepancy has been observed a large number of times when 
a coin of unknown constitution was thrown, a certain propor- 
tion of the trials were in all probability made with-a good 
coin. It is the latter fraction, not the former, which answers 
the question, ‘ What is the probability that the coin was 
good ?’ 

When we have once grasped the real bearing of the ques- 
tion of the likelihood of a good or bad coin, we see immediately 
that there are two essential elements in the question : 

A) The probability that Buffon should pick up a good’or 
bad coin to perform the experiments. Assuming that Buffon’s 


88 THE PROBABILITY OF CAUSES 


good faith is indubitable, this will depend upon the proportion 
of coins in circulation at his time which were good, at least 
for the purposes of such a trial. 

B) The probability that if he threw a good coin, he would 
obtain as large a discrepancy as was observed. 

If it were perfectly certain that no bad coins were in circu- 
lation at that time, it is clear that the problem would be 
meaningless, but that is by no means sure. In the same way, 
if it were absolutely impossible for a good coin to produce an 
observed result, the problem would have no sense. As both 
possibilities are open, we are face to face with a real problem. 

We shall mean by the cause of an event, any antecedent 
event whatever. We mean by the a priori probability that a 
certuin cause should be operative before the event in question 
has been observed, the limit of the number of occasions where 
the causal event happened to the number of cases where it 
happened or failed, as this latter number is indefinitely 
increased. We mean by the probability that a given cause 
should produce an observed result, the limit of the ratio of the 
number of times where the causal event was followed by the 
observed result, to the total number of times when the causal: 
event was.operative. The reader will not forget that, accord- 
ing to our first empirical axiom, all trials must be made under 
the same essential conditions. Consequently, in determining 
the probability that a certain cause should be operative, or 
that it should produce a certain result, we must assume that 
no other essential features in the situation have been allowed 
to vary. 

Suppose that there is a certain finite class of causes 
C,C,...C,, which might be followed by a certain event, and 
that they are mutually exclusive, yet one of them must have 
happened. What is the probability that the actual cause 
was C',? 

Let the w priort probabilities for the various causes be 
111, .+.1,, While the respective probabilities that they should 
be followed by the observed event are »,p,...P,_,- Let P be 
the probability sought. . 

The probability that cause C;, should oecur, and should pro- 


THE PROBABILITY OF CAUSES 89 


duce the observed event is m,9,. But this probability may 

be reckoned otherwise. It is the probability that the event 

should happen, namely, 7,p,+7,,+...7,~n multiplied by 

the probability P that it should arise from the cause in 

question. This gives 

Bayes’ Principle *] If C,C,...C, be the total number of mutu- 
ally exclisive causes of a certain class for an observed 
event, one of which must have occurred, if m9... be 
their respective a priori probabilities, while p,p....Pn 
are the various probabilities that they should be followed 
by the event, then the probability that the operative cuuse 
was CO, is 


ThPk ‘ (1 ) 

7) Py + Mp2 t +. + TrPn 
We shall give a statement of this principle in the case of 
continuous probability, as this will be of use later. We pass 

to it by the usual process of passing over from a sum to a 

definite integral. 

Bayes’ principle for continuous probability] If all causes of a 
certain class far an observed event, which are mutually 
exclusive, yet one of which must have ocewrred, depend 
analytically wpon n independent variables 2,2,...2_ UN 
such a way that the a priori probability thut these 

« variables take values in the infinitesimal interval 
2, +42, 2, +4dz2,,...m+kdz, differs by an infiniesi- 
mal of higher order from f (2,2,.+-2n) 2,d%,...d@n, f 
being an analytic function, while the probability that 
the observed event shall then follow is f (212,++-2n), then 
the probability that the event was produced by a cause 
corresponding to variables in a certain region R is 


| ie la fpde, dz, .. den 
| ile | fpde,de.. ibe 


* Bayes, ‘An essay towards solving a problem in the Doctrine of Chances’, 
Philosophical Transactions Royal Soc., vol. liii, 1763, and ‘A Demonstration of 
the Second Rule, &c.’, ibid., vol. liv, 1764. Czuber, Entwickelung der Wahr- 
scheinlichkeitsrechnung, cit. p. 258, gives the erroneous dates 1764 and 1765. 


(2) 


90 THE PROBABILITY OF CAUSES 


where the integral in the denominator is taken over the 
total field of variation of the variables compatible with 
the problem. 

As a first application of Bayes’ prinviple we take a well- 
known paradox of Bertrand’s known as his ‘ box paradox’.* 

Three boxes look exactly alike. Each contains two drawers, 
and in each drawer is a coin. In the first box there are two 
gold coins, in the second a gold and a silver coin, in the third 
two silver coins. A box is chosen at random and a drawer 
opened: what is the probability that the coin in the other 
drawer of the same box is of the opposite metal ? 

First reasoning. This can only happen if we have hit upon 
the second box, the chance for that is 3. 

Second reasoning. There is a $ chance that the coin first 
seen shall be gold. When gold has been seen, we know that 
we have chosen one of the first-two boxes, but we do not know 
which, they are equally likely, hence the chance for a gold 
coin followed by a silver is 3. There is an equal chance for a 
silver coin followed by a gold. Hence'the total chance is 4. 

It is evident that the first answer is right and the second 
wrong. The question is, What was wrong with the reasoning 
in the second case? Here is the flaw. Ifa gold coin has 
been seen, the a priori chance for the first or the second box 
is 4, but whereas the first has a chance 1 of showing a gold 
coin the first time, the second has only a chance # of doing so. 
The probability that the gold coin is in the second box is 


» ee | 
3+ 1 


$.ltd-d 3 
and there is a similar probability for a silver coin. Thig leads 
to the correct answer again, 


Example 1] An wrn contains N balls, black and white, in 
unknown proportion. A ball is drawn out n times 
and replaced, the balls being miaed after each drawing, 
with the result that just * white balls are seen. What 
is the probability that the urn contains exactly R white 
balls ? 


* Bertrand, loc. cit., p. 2. 


THE PROBABILITY OF CAUSES 91 


Hypothesis 1] All mixtures of white and black are equally 
likely a priori, Then all of the z’s are equal and will cancel 
out, and we have, by (1) 


N—Ry"-" 
_at=m(n) (Gr) arava 


2; eG x) (a =" > Sal seri 


What value of A will be the most likely? We obtain a ready 
and sufficiently accurate answer by equating to 0 the deriva- 
tive with respect to R of the logarithm. This gives 


We may, then, say that the most likely mixture is that 
where the actual proportion of white balls is the observed 
proportion, and this, indeed, is just what we should expect. 

Hypothesis 2] The urn was filled by drawing white and 
black balls at random from an extremely large number of 
balls where the two colours were found in equal profusion.* 

Here we have 


} 


N! 2 IN INE 
wes ne HTN ae Tr) Ge) 


t= IN: M! 1 MS Ky 
2 i K1(N- eee = ae) Ga 


RNR 
se~ ERAN 5) : 
om) RON ta 
©, KiN-K)! 


N N 
The probability that R should be close to oh an ny > was 


found, by the reasoning which led to the probability integral, 
to be e~* Az. 


* Bertrand, loc. cit., p. 152. 


92 THE PROBABILITY OF CAUSES 
We wish to maximize 
ase} N ie fs 2) e , N =e 
ale: De, a) 
r x w= 1 
or else aries 4 be ebindes er, tere 
os 

2 Ni 2 

Equating to zero the derivative of the logarithm 


2, 


Ver : : 
Since 2 q 38 to be of reasonable size, we may assume 


r 


Z 


Ne 


te ‘ ely 
is negligible, and reject 2? as compared with o° We get 


alt nll), 
N . 


<5 


2 
(27 —n) |e 
aoe. wera 
The most likely composition is 
Del 2  1N+27 


No 2" “fon 2 Naw 

This varies between 4 and r/n as we should expect. 

This first example leads us naturally to the idea of establish- 
ing some general formula for the probability of causes analo- 
gous to the Bernoulli formula. Suppose that an event has 
succeeded np times and failed nq times in » trials. If all 
probabilities for success be a priort equally likely, the most 
likely probability for success is p. What is the chance that 
the observed series resulted from the operations of a cause 
which gave a probability of success lying between 


ptt Re and ptt, [Pi 


THE PROBABILITY OF CAUSES 


The probability is 


P = ptt, 27 
7 n! 
2 __ np!ug! 
le = ppt, / 24 
fl : 
>? nv! Pm (1 
pao tP! 7g! 


np) (1 — Pra 


a Tague 


93 


With regard to the summations, which are rather meaning- 
less as they stand, we assume that 1P is an integer, so that 


P increases by 1/n each time. 


Let us write P=p+z2 24221 aD 
1 
Ac= ——: 
J 2 nipy 


We multiply every term above and below by this and 


cancel the factorials, getting 


Soe BN" 


2 mg eq 1 | 
ee ie nN 7 vera F() 


—ng | ee ts see tit 
qN n NY 


SS 2 pqy"? ef 2Pt)™ 9 He 
Vere" Gah" 
FRY 


Divide numerator and denominator by p"?q"% 


ap 2 ny 
log (1+ | A) ae ENE PY. 


= —ZF4 eo) 


WU? 


94 THE PROBABILITY OF CAUSES 


Our fraction above will approach asymptotically to 


Cindy 


The limit of this as 7 increases indefinitely is 


1 to 2 ] 2 ty 2 2 ty a2 
eee ar a eked Zg—- a lz. 
zee o “eval : u Elke : 

We thus get 


Theorem 1] If in a large number n of trials where all prob- 
abilities for an individual success are a priori equally 
likely, there be np successes and nq failures, then the 
probability that the cause is such as to give a prob- 
ability of success lying between the limits 


pt, [22 apa 
ptt, a 7 and ptt, e - 
is $[0 (t,)— © (t)] (3) 


Converse of Bernoulli’s theorem] Under the conditions of 
Theorem 1, the probability that the cause is such as to 


| 


give a probability of success in the limits p+t \-2 is 


6 (/). (4) 

4] We must now face the possibility that all causes are not 

equally likely a priori. We are thrown back upon our 
formula (2), which we simplify by the law of the mean 


[Foe @de=sla+h—O) | oe a 


The most interesting case is to compare the probabilities 
under the conditions of Theorem 1], except the a prior? con- 
dition, that ¢ should lie in the region between ¢, and ¢,, or else 
between t,’ and ¢t,’. The ratio will be 

flath(—t)]  @(t)—0 (ty) | 
Fla +h (to —t)] 9 (4) - 8 (4) 


(5) 


THE PROBABILITY OF CAUSES 95 


When the regions are both very small, we may put 

+t, = ot; ht, At; t/ +t) =O"; t/—t/ = 241; 
the ratio then takes the simpler form: 

AO SURG 6) 
FU) eat 

We shall apply this to an amusing problem proposed by 
Bertrand.* 

‘The owner of a gaming establishment has installed a rou- 
lette wheel. In 10,000 turns this has shown red 5,300 times, 
black 4,700 times. The owner refuses to pay for the wheel 
and claims damages ; his clients have noticed that the wheel 
seems to favour red. They go to law about it. The owner 
claims that a good wheel was never known to show such a 
discrepancy. 300 turns in 10,000 cannot be the result of 
chance. The chance for red is not $, as it ought to be. “ Never 
mind the record of the turns so far,” says the maker, “you 
cannot insure against the caprices of fortune. The machine 
was made by excellent workmen, and was carefully inspected. 
No part of it is imperfect. There is no bad centring of any 
wheel, no inequality in the size of the divisions, no error in 
levelling.” The Court calls in an expert; what should he 
say ?’ 

According to the maker, the probability for red lies between 
0-499 and 0-501; according to the owner it lies between 0-529 
and 0-531. We have 


sen aL SY 553% 0-94, 


w 100 
= 0-0070583, 
0-53 + 0-0070583t, = 0-501, 
¢, = —4-10862, 
t, = —4-39197, 
t = —4-25030, 
At = 0:1416, 


* Bertrand, loc. cit., p. 166. 


Problem. 
Discuss Buffon’s coin and Lazzerini’s needle by the methods of the 
present chapter. 


96 THE PROBABILITY OF CAUSES 


0-534 0-0070583t,’ = (east. 


t, = 0-1416, 

t,, = —0-1416, 

So : 
At’ = 0:1416. 


We thus need to find 
F(t) (4.2508) 
F(0) 
and we find from the table on p. 208 that this is 


Laer. 
7(0) x 0:000000015. 
Bertrand’s solution is simpler. He compares the probabili- 
ties under the two hypotheses of getting just this result, 
namely, 


jai ead rel a te 
F(0) , (0°53)530 (0-47 )#700 f (0) 


It will readily be granted that if the maker of the wheel be 
known to be careful and conscientious, f(t) will be many 
times larger than f (0). It is hard to believe, however, that 
the ratio of the two would be large enough to bring the pro- 
duct up to respectable size. The expert would doubtless 
decide against the maker. 

There is another point that should be noted in this con- 
nexion, which is rather subtle and easily overlooked. We 
have no right to settle after the event what constitutes really 
aremarkable run. Let us return to Buffon and his coin. It 
will be noted that the discrepancy was 28, and this is exactly 
the year of the Christian era when John the Baptist was cast 
into prison. Let us examine the probability that the coin was 
so constructed as to show this date when thrown that number 
of times. It is easy to calculate the probability that a coin 
giving to heads the probability 507/1010 should show no dis- 
crepancy in 4,040 throws, and this is considerably greater than 
the probability that a good coin should show exactly the dis- 
crepancy 28. But the w priort probability that a coin should 
be so constructed as to predict the date of Jobn the Baptist’s 


x 0:000000015. 


THE PROBABILITY OF CAUSES Ne 


imprisonment is so microscopic compared with the probability 
that a coin should be good, that we reject the former hypo- 
thesis without more discussion. The reader will find it 
amusing to apply this type of reasoning to such problems as 
the probability that the great Pyramid was specially placed 
by Divine Providence to reveal the value of 7, the length of 
the British inch, and other interesting facts which Piazzi 
Smyth and others have deduced from its measurements. 

Bayes’ principle has sometimes been used to deduce the 
probability for future events. The reader will have no 
difficulty in proving: 


Bayes’ principle applied to future events] Jf C,C,...C, be the 
total number of mutually exclusive causes for an 
observed event, one of which must have occurred, if 
11M... be their respective a priori probabilities, 
P1Po+++Pn the various probabilities that they should be 
followed by the observed event, while P,P,...P,, are the 
respective probabilities that they shall be followed by an 
expected event, then the probability that the expected 
event shall tuke place is 

K=2n 
D> MPP x 
ear (7) 
> Tr Pr 
K=1 
In the same way we may prove: 

Bayes’ principle for continuous probability applied to future 
events] If all causes of a certain cluss for an observed 
event, which are mutually exclusive yet one of which 
must have happened, depend analytically wpon n 
independent variables 2,2, ...2, im such a way that 
the a priori probability that these variables take values 
in the infinitesimal intervals 

a +$dz,, Zt Zaz, ee Sat 5dZy, 
differs by an infinitesimal of higher order from 
_ Ff (% 12g +06 Sn) U2 2%, ... Zp, 
while the probability that the observed event shall then 


2686 H 


98 THE PROBABILITY OF CAUSES 


follow is (2,2, ...%,) and the probability for a future 
event is yr (z,2...%n), then the total probability for the 
occurrence of the future event is 


|| J fowaede ae 


[JJ foaeaen.. oe 


each integral being taken over the whole field of possible 
values. 


(8) 


Example 2] If in n trials of an event for which all probabili- 
ties are equally likely a priori, there have been just r 
successes, What is the probability that there will’ be 
just R wn a further series of N trials? 


If x be the probability for success, the probability will be, 
by (8) 


vg A 2 
x CNR) | ie al Sr gases (r+R) da 


1 
| x" (1—a)""" dx 
0 


: I! m! 
Ufy —p\n ek re NR A aa 
Now | (1—a)"da ene 
Hence our desired probability is 
Wt (R+r)'(N+n-(R+r)! (n+)! |g 
Ri (N-f)! (N+m+1)! ri(n—r)! (9) 


When all the numbers are large we may apply Stirling’s 
formula, getting 
1 NOD (Re PTE (Na (Ren) tt Sth ng 148 . 
/20 RES (NR) OE (Nay eee ey ( ) 
When we are interested in only one further trial, 
NS ed 


and (9) becomes 
(re Dy gion) (ies LS r+1 1 
(w+2)Irl(n—r)! ~— n+2 ) 


THE PROBABILITY OF CAUSES 99 


When the event has never failed so far, » = » and we have 
(7 +1)/(n+4+ 2). (12) 

The most absurd consequences have, in the past, been 
deduced from this formula. Putting n equal to the number 
of times the sun has risen, it has been used to estimate the 
probability that it will rise the next day. Nothing could be 
more grotesque. The rising of the sun is not a statistical 
event whose cause is obscure, but a mechanical necessity 
which will continue as long as present astronomical conditions 
do, and will then cease. To use formula (12) we should have 
to assume that all possible cosmogonies were equally likely. 
What such a phrase may mean is utterly beyond our com- 
prehension: it undoubtedly means nothing whatsoever. 

The probability that exactly the same proportion of success 
will appear in a second series of 1 trials as appeared in a first 
series will be found from (9) by putting N=n, R=r. 

nm+1 n! 27 (2r)!(2n—27r)! 
S|. [ 2n! i; 

Replacing the first factor by 4, approximating to the rest 

by Stirling’s formula, we have 


ance 


On the other hand, if we surely knew that the probability 
for success was 7/n, the probability for exactly 7 success is 
given by the last formula of Ch. ITI, § 2, namely 


(Gaara ie 


The difference between the two arises from the fact that in 
one case we are sure of the probability of success; in the 
other, we only surmise it.* 

It is perfectly evident that Bayes’ principle is open to very 
grave question, and should only be used with the greatest 
caution. The difficulty lies with the a prior probabilities. 
We generally have no real line on them, so take them all 
equal. Suppose that ~ balls have been drawn at random 


* This interesting comparison is taken from Czuber, Wahrscheinlichkeits- 
rechnung, cit. p. 200. 


100 THE PROBABILITY OF CAUSES 


from an urn having white and black balls in unknown 
mixture, and that a white ball has been drawn just 7 times. 
What is the probability of drawing a white ball the next 
time? We should like to use formila (11). When is it 
safe to do so? 

That formula was derived on the hypothesis that all mixtures 
were, @ priori, equally likely. That does not mean that 
when we know nothing at all about an urn all mixtures are 
equally likely. We have already discussed that meaning of 
equally likely in Ch. I. What it does mean is this: * Imagine 
an immense number of urns containing black and white balls 
in varying proportions, but with a fixed number of urns with 
each mixture. Then if an urn be drawn at random and 
2 drawings, with replacement, be made therefrom, showing 
just rv white balls, the probability that the next ball will be 
white is accurately given by (11). It is only when we can 
give a really precise statement of this sort that Bayes’ 
principle can be used with perfect confidence, and the cases 
are rare. 

Why not, then, reject the formula outright? Because, 
defective as it is, Bayes’ formula is the only thing we have 
to answer certain important questions which do arise in the 
calculus of probability. The question as to the likelihood 
that a coin which showed a given succession of heads and 
tails should be bad is real and insistent. To say what might 
reasonably have been expected from a good coin under the 
circumstances does not, by any means, cover the case. There- 
fore we use Bayes’ formula with a sigh, as the only thing 
available under the circumstances : 

‘Steyning tuk him for the reason the thief tuk the hot 
stove—bekaze there was nothing else that season.’ + 


* Cf. Castelnuovo, loc. cit., p. 170. “ 
+ Kipling, Captains Courageous, ch. vi. 


CHAPTER VII 
ERRORS OF OBSERVATION 


§ 1. Determination of the ‘Best Value’ 


THERE is no such thing as a perfect physical measurement. 
Absolute accuracy is a fiction, and is never attained in 
practice. What is meant by an ‘exact value’ is a value 
which is sufficiently exact for purposes of a certain class. In 
fact it is not always possible to say what is meant by the 
‘true value’ of any quantity. What is the true length of 
a bar of iron? That will depend on the temperature of the 
iron; perhaps on the direction and velocity of its motion 
through space, if the recent theories of relativity be correct. 
But if there be room for doubt as to what the true value 
really is, there will be infinitely more about any attempts to 
measure it. Suppose that we say that two towns are exactly 
three and one-half miles apart, what do we really mean? 
Different persons will mean different things by these same 
words. A careless person might mean that some point within 
a few rods of the post office in one is exactly threo and one- 
half miles from some point within a few rods of the jail in 
the other, but such a statement would never do for a surveyor. 
If he said that the towns were exactly three and one-half 
miles apart he would mean that some landmark, as a mile- 
stone, in one was separated from a similar landmark in the 
other by a distance within a few inches of three and one-half 
miles. The geographical meridian of Paris runs from a mark 
in the middle of a doorway on the south side of the Observatory 
to a short vertical iron rod in the middle of a hole in a stone 
column erected in the park of Mont Souris. This extreme 
topological accuracy would be counted the height of care- 
lessness in a machine shop where lengths were measured to 
the nearest thousandth of an inch, and machine-shop accuracy 


102 ERRORS OF OBSERVATION 


is nowhere near sufficient for work in optics, where we think 
in terms of wave lengths of light. 

A true theory of physical measurements must therefore 
start from the assumption that they always contain errors. 
To what are these errors due? A few moments’ reflection 
shows that they fall into two general classes : 

A) Constant errors. These are due to inherent imperfections 
in the instruments of observation, and in the observer, but 
do not vary from one observation to another one bearing on 
the same object. We measure distances with a scale whose 
indicated lengths are too short. We observe the altitude of 
the sun with a sextant whose 0 is wrongly placed... We 
measure a time interval with a chronograph which gains at 
a constant rate. We note the transit of a star across the 
hair-line when our personal equation causes us to record 
the phenomenon too soon. Errors of this general sort are 
inseparable from any sort of physical observation. Neither 
the instrument nor the observer can be perfected to such an 
extent as to eliminate them completely. AJl that we can do 
is to estimate them as accurately as possible by measuring 
quantities of known value, or by other means. 

B) Accidental errors. These are supposed to arise from 
minute causes which vary from one observation to another ; 
they are fluctuating variations in the observer, the instru- 
ments, and the quantity observed. ‘To run through the same 
list as before, the coefficient of expansion of the scale may be 
different from that of the quantity measured, and the tem- 
perature may be somewhat above or below the mean. In 
reading the vernier of a sextant, the lines nearest coincidence 
will differ by a fraction of a hair’s breadth, one way or the 
other. The chronograph is not perfectly sealed from the outer 
air, and is influenced by variations of temperature and 
atmospheric pressure. The observer's nervous reaction is 
slightly faster or slower than usual, causing a variation in 
the rapidity of perceiving the passage of a star across the 
spider line. 

The fundamental problem with which we shall-be occupied 
in the present chapter is to formulate a general mathematical 


DETERMINATION OF THE ‘BEST VALUE’ 103 


theory of these accidental errors. At the outset it must be 
understood, beyond all possibility of misconception, that any 
such law will represent merely an approximation to the 
truth. There is no answer to the question, ‘Why should 
accidental, errors in different sorts of observations obey the 
same law?’ There is no reason why they should, and 
undoubtedly they do not. The real question is: Can an 
approximate law be found: which is sufficiently accurate for 
the purposes for which it is needed? The ultimate test 
for such a law will be ‘ how well does it work out in practice?’ 
If it work well, it is a good law, even if founded on assump- 
tions of doubtful validity. If it work badly, then it is of 
little importance, even though the mathematical deduction be 
highly instructive. The problem is to make the broadest 
and most plausible assumptions which will lead to a definite 
formula, and then to test that formula in practice. 


Assumption 1] The mean value of an accidental error is zero. 


It must be understood that this is just an arbitrary assump- 
tion, like those on which elementary geometry is based. 
That it is a plausible one is seen from considering the opposite 
case. Tor if this mean value were positive or negative, there 
would be a constant tendency towards errors of the one sort 
or the other, and this would count in with the constant 
errors. i 


Assumption 2] The probability of an accidental error decreases 
as the numerical magnitude of that error increases. 

There is a so-called proof of this principle which, in reality, 
is based upon an assumption far less obvious than the assump- 
tion in question. The idea is that each accidental error is 
the result of an accumulation of atomic errors called ‘ funda- 
mental errors’, arising from small independent causes. These 
fundamental errors are supposed to be of the same size, and 
each has an equal chance of being positive or negative. The 
error actually committed represents the excess of positive 
over negative fundamental errors, or, vice versa, it is pro- 
portional to the discrepancy in a series of trials where there 
is a half chance of heads or tails, and we know already that 


104 ERRORS OF OBSERVATION 


thé chance for a discrepancy is less and less as the latter 
increases numerically. The reason why we do not favour 
this method of treating the subject is that, in reality, it seems 
likely that these fundamental errors ate a pure fiction, and 
that the actual errors committed do not arise in any such way. 

We now suppose that we have a set of discordant observa- 
tions of the same quantity, after all constant errors have been 
eliminated or accounted for. The obvious fundamental 
question is this: What value shall we take as our best estimate 
of the quantity? We shall answer this question by making 
certain plausible mathematical assumptions about a quantity 
which we shall eall the best valwe, and show how this latter 
can then be found. It must be understood that these assump- 
tions are nothing but definitions of what the words ‘ best value’ 
mean. This method of procedure seems to have been first 
developed by the Italian astronomer Schiaparelli.* We shall 
do our best to motivate our assumptions as we go along. 


Postulate 1] When a number of discordant measwres have 
been made on the same magnitude, and constant errors 
have been eliminated, the best value is a continuous 
function of the measures, which possesses first purtial 
derivulives with respect to all the arguments. 

The obvious objection has been made to this postulate 
that it was not at all evident why this function should be 
differentiable. Schimmack reached the same function by 
somewhat different postulates which did not include differentia- 
bility,f and his postulates have been shown by Beetle to 
be completely independent.t{ But Schimmack makes the 
assumption that the best value for 27 +1 observations is what 

* Schiaparelli, ‘Sul principio della media aritmetica’,, Rendiconti del 
R. Istituto Lombardo, Series 2, vol. ii, 1868, and ‘Sur le principe de la moyenne 
arithmétique’, Astronomische Nachrichten, vol. 1xxxvii, 1876 (Czuber, Entwicke- 
lung, cit., gives for this the erroneous date of 1895). This last is a refutation 
of a priority claim put forth in the same number by Stone, and based upon 
his paper, ‘On the most probable result which can be deduced from a number 
of direct determinations of assumed equal values’, Monthly Notices Royal Astro- 
nomical Soc., vol, xxxiil, 1873. 

+ Schimmack, ‘ Der Satz vom arithmetischen Mittel’, Math. Annalen, vol. 
Ixviii, 1909. 

t Beetle, ‘On the Complete Independence of Schimmack’s Postulates’, 
ibid., vol, Ixxvi, 1915. 


DETERMINATION OF THE ‘BEST VALUE’ 105 


it would be if each of the first » were replaced by the best 
value for those n, and this does not seem at all self-evident 
either. It is certainly hard to believe that the best value is 
not a continuous function, and, if continuous, we can approach 
to it with any degree of accuracy we desire by means of 
differentiable functions, so that the inclusion of differentiability 
does not add much ‘ to the load’. 

Suppose that for one reason or another we decide, in the 
course of our observations, to change the scale or unit. We 
should naturally expect to produce thereby a corresponding 
change in our best value. This leads to 
Postulate 2] Jf all the observed values be multiplied by the 

same constant factor, the best value will be multiplied 
thereby. 

We naturally look upon the best value as intrinsic in the 
observations, and independent of the origin whence measure- 
ments are made. ‘This leads to 
Postulate 3] If the same constant be added to each of the 

observed values, thut constant will be added to the best 
value. 

If the best value be a function only of the observations, 
the order in which they are taken must not affect the latter. 
This gives 
Postulate 4] When all the measures are equally trustworthy, 

the best vulue is a symmetric function of them. 


With these postulates, it is easy to determine what sort of 
a function the best value is. Let the observed values, after 
constant errors have been eliminated, be x,, 2, ...%,. The 
best value shall be f(a,, “,, ... @,). Since it is certainly 
possible that all the observed values should be equal, the 
function cannot become singular for every set of values 
1, Ly, ++. ®,. Hence, by change of origin we may assume 
that the function and its derivatives all exist for the set of 
values 0, 0,...0. By the law of the mean we have 

Uef (2, Hg) +11 @n) =f (kat,, hitg, ... kay) 


Wy) 


=) Cl) . O)+ 2 he, 3 


4=1 


106 ERRORS OF OBSERVATION 


Where 0 < y; < ku; or 0 > y; > ka,;. 
Putting k = 0 Wis Ua Up cent) a 
Dividing by k : 
Cs ay oth) = a 
7 
Since the left side is independent of k, on the right we may 
put &=0, and y,= 0. 
DE gate 


If da, — (ty when x; == (i) 


i=n 


F (Lig os cen = be A,X;. 


(ei 
Changing x; to a;+d, and applying postulate 3] 
xa; hs 
It is better to replace the coefficients a; by numbers pro- 
portional to them and write the best value 


Pi + Poy + 10+ + Pn En 


= (1) 
Pit Pot t+Pn 


treo oe Wy 


Theorem 1] If a set of discordant measurements be taken of 
the same olyect, the best value to take, after constant 
errors have been eliminated, is a homogeneous linear 
Junction of the measures, where the swum of the coefficients 
is equal to unity. 


We find immediately from postulate 4] 


Theorem 2] When all of the measurements are equally trust- 
worthy, the best value is their average. 


The coefficients are called weights, and it is evident in 
formula (1) that we are not primarily concerned with their 
actual values, but with their ratios. We shall also, hereafter, 
refer to the ‘best value’ as the weighted mean. Suppose, 
further, that 2, was found as the average of n, standard 
observations, #, as the average of n, of them, and a, as the 
average of »,. The weighted mean of all of the standard 
observations would be the expression (1) where the letter p,; 
was replaced by the corresponding letter 7,;. 


DETERMINATION OF THE ‘BEST VALUE’ 107 


Theorem 3] If it be possible to express each measurement as 
the average of a certain number of standard observa- 
tions, then the weights in the weighted mean are 
proportional to the number of standard observations 
in each case. 

We must now give a number of definitions which will be 
of frequent use in what follows. 


Definition] The positive square root of the mean value of the 
square of an error which may occur in a series of like 
observations is called the mean error. 


Definition] The mean value of the nwmerical measure of the 
error is called the average error.* 


Definition] The positive number which there is a half chance 
that the numerical value of the error will not exceed is 
called the probable error. 

The reader should compare these definitions with those of 
mean, average, and probable discrepancies on pp. 49,50. We 
shall see later that when the errors are distributed according 
to the exponential law of Gauss, these three are constant 
multiples one of another. 

Let the real unknown value of the quantity we are measur- 
ing be a The error of the weighted mean is 


a (x; — 2) 
ee | 


t= 


> Pi 


¢é=1 
It is a little more convenient to write this in the form 


i=n t= 
ZG (t%-2)3 Da; 
ae fel 
The mean value of each of these terms is, by Assumption 1], 
equal to 0. 
* There is no complete agreement as to these definitions. Some books use 
the term ‘ mean error’ for that which we have called ‘ average error’. Others 


call our mean error, which is really rather ill-named, ‘root mean square’, a 
ponderous title. 


108 ERRORS OF OBSERVATION 


Let us assume that the unknown mean error of the measure- 
ment 2; is 1 
k, VW 2 
The reasons for writing this clumsy expression will appear in 
the sequel. We wish to find the mean error of the weighted 
mean, We may apply Ch. IV, 5] and write for this the value 


1 a,” eb 
Kya | 2 ag Bua 2 


When all of the measures are equally trustworthy, each a; 
is 1/n, so that the mean error of the weighted mean is 


Be Pog 
KV/2 kJ 2n 
Theorem 4] The mean error of the average of « number of 
equally trustworthy measurements is the mean error 
of a single measwrement, divided by the square root of 
the number of measurements. 

Let us see what values of the coefficients a; will minimize 

the mean error of the weighted mean, We must minimize 


1} 


He aay 
Piggies >a= WG 


ip | e eo I 


This amounts to minimizing 


S a; _ 1 )) ; 

PAG. p ( oF’ 

equating to 0 the partial derivative to @ we have 
q g p 


(ON eis = pk? 


v i=un 


Ps 
] 


¢ 
a 
When we are in the case where a; is the average of 7, 


standard measurements this amounts to putting 


2k? = 2h?n, 3 


3. P= on, 


and gives the system of weighting already found. It is 


DETERMINATION OF THE ‘BEST VALUE’ 109 


natural, then, to make the minimizing of this mean value 
a general principle, and state : 


Postulate 5] The weights in the weighted mean are those 
coefficients which will make the mean error of this 
expression a Minimum. 


Theorem 5] Thé weights in the weighted mean are inversely 
proportional to the squares of the mean errors of the 
individual measurements. 

The trouble with all of this work with the mean errors 
is that we do not really know anything at all about the 
errors actually committed. If we did, we should know 
the true value sought. The best we can do is to manipulate 
certain observed quantities nearly equal to the errors. 


Definition] The difference between a measurement and the 
weighted mean is called a residual error, or, more 
briefly, a residual. 


The residual corresponding to the measurement 2; is 
€; = @;—a@, 
= («;— x) —(&—2), 


aay 


= («;—«) — D a; (x;—2). 
t~=1 
Theorem 6] The mean value of a residual is 0. 
The quantity in which we are particularly interested is 


v=n Lt 


Day | 2 Dis (3) 


he esti 


When we are under the hypothesis of 3] this is the average 
of certain observed quantities, and by Tchebycheff’s inequality, 
may then safely be replaced by its mean value. This leads 
naturally to: 


Assumption 3] The mean value of expression (3) may be 
replaced by its observed value. 
Let us calculate this mean value. 


110 ERRORS OF OBSERVATION 


€; = 2;—-2 
22; s—2pje j 
Pears : 
22; (x;—a) — 22; (x;— x) 
. 2p; 
j 


The notation 5’ means that the term with subscript 7 is 
lacking. The mean value of each individual term is 0; hence 


the mean value of «€,” is 
p; 


OP; 5) sep au Tee 
(2 p;)" 


Now, by theorem 5] 
Leas by, 
2k2 p; 2k 


Hence the mean value of e,? is 


1 1 
¥'»;| @'P) 5, +1 Lae 
(2 p;)” 

¥) 
ioe 
es P;) 2k? 
= Fn, 
re 
~~ 2k Sn, 
Be Pj 


Equating the mean value of (3) to its observed value, 


t=n 


> Pee 


Lt ees 


tan <soht eee 
> Pi > Pi 


i=1 i=l 


DETERMINATION OF THE ‘BEST VALUE’ it 


1 — il 
k/2 n—1 ; 
i=n 4 
1 | =r 


k,V2 Xe \p;(n—1) 
We thus get from (2) 


Theorem 7] The mean error of the weighted mean of a set of 
measurements whose weights are p,, Py, +» Pn» while the 


corresponding residuals are €,, €,, +. Eqs 18 
S L 
Pie? 
1 | i=] fe (4) 


Theorem 8] The mean error of a measure of weight p; under 
the same circumstances is 


t=n 1 

5 Zid 

Eva 1p) - 

Theorem 9] When each weighted observation is the average of 
a number of standard measwres, the mean error 
of a standard measurement is 


(a x 
2 ie | 


1 i=l - 

kv 2 Tal (v—1) J ; (8) 

q We saw in the work which led up to postulate 4] that 

when the given observations are equally trustworthy, the 

average is that weighted mean which will have the least 

mean error. Moreover, the sum of the squares of the actual 
errors i=” 


will be a minimum if aK 


112 ERRORS OF OBSERVATION 


and this gives additional reason to choose the average as 
the best value. At the same time, there arise.cases where the 
observed values group themselves somewhat asymmetrically 
about the average, and the question arises whether it be not 
well to take a best value which will minimize some other 
function of the observed measurements. For instance, what 
value will minimize the sum of the numerical values of 
the errors ? 

q If an assumed value lie between two observed values, 
the sum of the numerical values of its divergences from the 
two is equal to their numerical divergence. If it do not 
lie between the two, this sum increases as the observed value 
recedes from the observed values, for it is equal to their 
numerical difference, plus the divergence from the nearest 
one. Let the observed values be a, 2, 2, ...&, arranged 
in ascending order ef magnitude, and let 

pia TES Coe 
Suppose that we take a value « where 
Dy eC ey Oh <a Ns 

The sum of the numerical divergences of x from the different 
observed values will be 
(T+ 0p) + 2 (To +7 py) + 3 (134+ Tp-2) ++ 

Hh (7p +P ys) H(RAY) Py_pt oo +(—2Kh+ 1) (@,— 2). 

The value x, is that value in the interval in question which 
will make this a minimum. In the same way, if 24 > 7 the 
sum would be a minimum if #,_, = a. 

If a lie in the middle interval, the sum will be the same 
throughout. 


Definition] The middle term wn order of magnitude of an 
odd number of terms and the average of the middle 


Problems. 

1. An angle was measured by a theodolite (mean error 46-5’) to be 
29° 13’ 40’, and by a transit (mean error 25-3’’) to be 29° 13’ 24”. Find 
best value and its mean error. 

2. A distance was measured as follows : 

A) with steel tape 741.17; 741.09; 741-22; 741.12; 741.01, 
B) with chain 741-2; 741.4; 741-0; 741-3; 741-1. 
Find best value and mean error. 


DETERMINATION OF THE ‘BEST VALUE’ 113 


terms of an even number of terms is culled the median 
of the series. 
We thus have a theorem due, apparently, to Fechner.* 


q Theorem 10] The swm of the wuwmerical values of the 
divergences of a number from a given series of numbers 
wil be a.minimum if the number in question be the 
median. 


Another value occasionally used, especially in statistical 
work, is the mode, which is the point of accumulation of the 
given set of measures. 


§ 2. The Law of Error. 


We saw at the beginning of the present chapter that the 
title of the present section is essentially a misnomer. There 
is no such thing in Nature as a law of error, i.e. a fixed 
principle according to which accidental errors are always 
distributed. For mathematical purposes we desire a continuous 
function, with a certain number of continuous derivatives, 
which will express the probability for an error of given 
magnitude, but in fact there is a certain number which 
represents the maximum possible numerical error. The prob- 
ability for an error very close to this will be finite, the 
probability for any numerically greater error is rigorously 
zero. Consequently no analytic function whose argument 
runs from —o to o can fit the case for all values of that 
argument. 

We mean, then, by the law of error a mathematical formula, 
reached by plausible reasoning, which in practice will give 
approximately the proportion of accidental errors in any 
appropriate interval. To make the law as plausible as 
possible, we shall start from the broadest assumptions that 
will give what we want, a set considerably broader than 
that usually taken, but the acid test will lie in the question : 
do observed errors conform with any reasonable degree of 
closeness to the law which has been deduced ? 


* Fechner, ‘Ueber den Ausgangswerth der kleinsten Abweichungen’, Sit- 
cungsberichte der K, Akademie der Wissenschaft zu Leipzig, vol. xi, 1874, p. 29. 


2686 I 


114 ERRORS OF OBSERVATION 


Assumption 4] The a priori probability that a quantity to be 
observed shall have a value in the infinitesimal interval 
x+4da, x being in a certain continuous region S, will 
differ by an infinitesimal of higher order from f (x) da, 
where f is a function, single-valued and analytic, 
throughout the whole reach of possible values. 


Assumption 5] The probability that a quantity whose true 
value ws X should wnder specified conditions be observed 
to have after the removal of constant errors a value 
in the infinitesimal region «+4da, where x is a point 
of S, will differ by an infinitesimal of higher order 
from © (X, x) dx where the function © and tts partial 
derivatives of the first two orders are continuous, 
and where its value is independent of the choice of 
origin. 


Assumption 6] If the infinitesimal increment da be sufficiently 
small the probability that the true value lies in the 
region xt+hdu is a maximum when x is the weighted 
average of the observed values. 

It is evident that these assumptions have not absolutely 
axiomatic force, yet all are reasonably plausible. To assume 
that the function @ is independent of the origin is natural, 
for we expect that accidental errors will arise from physical 
causes, and not from the position of the 0 on the recording 
instrument. As for the last assumption, our continuous 
function must have a maximum somewhere in the region, and 
the weighted average seems as likely to give that maximum 
as any other number we could naturally think of. 

Let us proceed to deduce our law from these assumptions. 
Since & is independent of the origin 


®(X+k, c+k) = (KX, x). 
Putting k=—X, 
$(X, x) = (0, x—-X) = (w—X) = P(§). 


It appears, therefore, that ® is a function of the error alone. 
This fact, which is sometimes assumed in so many words, has 
given rise to criticism, yet it follows at once from our plausible 


THE LAW OF ERROR 115 


assumption about the independence of the origin.* As 


matter of notation, let us write 

Observed values Win Woy con Lys 
Weights LORS ROS Le 
Weighted average & = Xp,;a;/Zp;. 
True value X.« 


Observed errors é,=2,-X. 
Residual errors 6; = &;—&, 
— é; = = Dik; Fy 
‘BD; 


The probability that these observations arose from a quantity 
whose true value is X is given by Bayes’ formula for con- 
tinuous probability, developed in Ch. VI, p. 98, 


iN QO LUCAS sao SA ot a 
[F@) 8 (1-2) $ (4-2) .. 6 (@,—2) de 


The integration in the denominator is supposed to be 
extended throughout the whole of the region S. This ex- 
pression will be a maximum with the logarithm of the 
numerator. Equating the logarithmic derivative to 0, 

§ ad log f dlog ¢$ (€,) 7 d log > (£,) He dlog $ (€,,) 
dz dé, dé, ; dé, 
Pibit Pogot see + Dnkn = 0. (9) 

The first term is independent of the observed values. 
Suppose that we are so lucky as to get exactly the right 
value each time, an allowable case, 

dlog f d log 
Sdn ig = 
é=0. 


Now the function f is independent of n, hence 


(7) 


+ = 01(8) 


=0, —f = const. | (10) 


We have thus removed the troublesome a priori probabilities 
from our path. 


* Bertrand, loc. cit., p. 177; Poincaré, loc. cit., p. 152. 


12 


116 ERRORS OF OBSERVATION 


Going back to the general case, assume that & remains 


fixed, while a,, #,, ... @, take infinitesimal increments. 
d ere SE d Fdlog > (&.) 
fe eer eee rr a wel dé, 


dd loz (é,) ae 
DCE, + P,dE,+ + Dndbn = 0. (12) 
One of these equations in the variables dé,, dé,... holds 
whenever the other does, hence 


d [dlog > (€;) Zz 
dé; ae ] = —21p,. 


Integrating once 
d log ¢ ( dlog p(E;) _ 
cee 
and the constant H is seen to be 0 by (8) and (9). 
Integrating again 


— 2lp,é;+ A, 


 (6;) = re, (13) 

It is evident, on the face of things, that this formula cannot 
be strictly correct outside of certain definite limits, for it 
gives a finite probability for an obviously impossible error. 
It is also clear that the statement that f is constant could not 
hold from infinity to infinity, as that would involve the 


ridiculous conclusion that | cdz = 1. We note, however, 


that: 1) it seems plausible to assume that f is constant 
throughout a certain region, and drops to zero rapidly outside ; 
2) expression (13) becomes rapidly very small. The effect 
called for by the first of these will be sensibly produced by 
assuming (13) to hold everywhere. 


Assumption 7] For the purpose of calculating constants 
formula (13) may be assumed universally true. 


Assumption 8] For the purpose of calculating constants, when 
the observations are all of equal weight, the mean 


value of 6° = mo M.V.é2 may be replaced by the 


observed average. 


THE LAW OF ERROR 117 


Replacing this latter by the familiar expression 1/2k,?, 


@ co ] 
| en Bie? dé; = 1, r| Ete~ Bite dE; = A at 
aes = A 


Putting E, Vip; = t, 
2r Ie ya 2r |, a0) il 
———— g dt = ie a ¢? f = ’ 
CN a es V0 eyak 
Vip, ° (tpt? 
2 k; 
lp, = k?, r= ae 


Dropping the subscript, we have finally : 


Gauss’s Exponential Law of Error.* The probability, wnder 
Assumptions 1-6, that the observed measurement of 
a quantity shall have an accidental error in the 
infinitesimal region £+4dé differs by an infinitesi- 
mal of higher order from the expression 


k —Krg2 
yee us (14) 
where the mean error of a single observation is 
1 
ee ee) 


Q It is evident that of all of our assumptions, the least 
plausible is 6]. It has been suggested that it would be more 
natural, not to assume that the weighted mean gave the 
greatest possible probability to the observed series, but that 
it was the mean of all possible values, in view of the ones 
that had been observed. This can be carried through, but 
the calculation is long. 

The form of the probability function is not in the least 
surprising. We saw in discussing Assumption 2] that if we 

* We have given essentially Gauss’s first deduction, which appears in all 
text-books on Least Squares. The original is in his ‘Theoria Motus Corporum 
Coelestium’. See his Collected Works, vol. vii, Hamburg, 1809, p. 232. But 
Gauss assumes explicitly that @ is a function of the error alone, and that 


Jf is a constant. 
+ Poincaré, loc. cit., p. 156. 


118 ERRORS OF OBSERVATION 


assumed that the actual accidental error committed was the 
surplus of positive over negative elementary’ errors, or vice 
versa, that this assumption would be fulfilled. The error 
would be the discrepancy in a series of trials where there 
was an even chance for success or failure, and our present 
formula (14) is merely a restatement of formula (10) of 
Ch. III. This method of reaching the probability function 
is beyond a peradventure much the simplest,* the only trouble 
is, as we have already seen, there is no rea! reason to believe 
that such things as elementary errors really exist in practice. 

The fundamental constant k that appears in the formula 
is called the precision. It is inversely proportional to the 
mean error of a single observation, and directly proportional 
to the square root of the weight that should be attached to 
that observation in combining it with others. In actual 
practice, however, especially in the United States, it is more 
customary to give the probable error than the mean error or 
the precision. To find the probable error » we put 


Let k€ = t; 
@ (kp) = 3, 
kp = 0:4769, 
I 
p = 0-4769 (;) . (16) 


Again, to find the average error, since positive and negative 
errors are equally likely, we have 


2h (ase 
Av. error = =|, ge Fag 


a —? 
i k =. |, i uh, 


Av. error = 


1 
ko Va (17) 


* This method of deduction is apparently due to Hagen, @rundeziige der 
Wahrscheinlichkettsrechnung, Berlin, 1837. 


THE LAW OF ERROR 119 


Theorem 11] If a set of measurements follow the Law of 
Gauss, the mean error, probable error, and average 
error are constant multiples one of another. 

Suppose that we have two independent quantities a, and a, 
of such a nature that the measurements of each follow the 
law of Gauss, their respective precisions being /, and h,. 
What will be the law of error for the expression 

EO Oa, 
X, = 4,X,+a,X,, 
ff = 46,446. 

The probability that ¢,’ should be in a given infinitesimal 

region differs by an infinitesimal of higher order from 


bila) eomretag,fematag, = bel ,-atatente ag, dé. 
T vie 


This integral is extended over so much of the €,, , plane 
as will make ¢,’ lie in the infinitesimal region demanded. 
We proceed to change variables in this integral, putting 

fy = 8, + €., 
§,/ = 0,6, +,8. 
We will choose 6, and 6, in such a way that when 
Ky? Ey? + key?" 
is expressed in terms of €,’ and €,’, there will be no product 
term. This gives 
k,7b,a,+k,2b,a, = 0, 
ay = 0, £, +E, 
by = hyd dy + hPa £, 
kh? (Ey)? + (6) 
Be Ae Sacre 
RACs rz) a 1 : 
(Ey by) byPag? + ky? a,” 
Hence our integral above is 
k, k, Megs fa Hey? Keg? (Ex')? 


@ ky? ag? + heya,” dé,’ 


2y 2 
TF (key? ig + keg? a1”) Jy! 
Ip ars 
i) f,” kk. hey? keg? & 
es 1 %e Ca Serene Er ya) r 
| e kag? + ky? dé, = ean waa hy? ay? + keg? ay? dé, : 


120 ERRORS OF OBSERVATION 


: 1 ee eal Sees 
Puttin —= 74+, 
8 Ke eee eles 
| , , , 
we get iy e NY gee. 


Our new series of observations follow the law of Gauss 
with the precision k,’. We thus reach, by mathematical 
induction, 


Theorem 12] Jf 2,, 2, ... &, be n independent quantities 
whose measures follow the law of Gauss with the re- 
spective precisions k,, ky, ... ky, the quantity 


A, + Ayy +... ly 
obeys the same law with the precision k’ where 


eB i? 
(ye Bo ke (18) 

Where the quantity under observation is the weighted 
mean we have Ds Ie,2 


Theorem 13] Jf a series of observations obey the law of Gauss 
with the precisions k,, k,, ... ky, respectively, their 
weighted mean will obey the same law with a precision 
K, where je Vr, (19) 

j 


A residual is an observation which is linear in the given 
system of observations. Its mean value and its true value 
are 0 by 6]. Ife; be the 2th residual, 


(22; — Pj) ©; —P; &; — Dp, W},... 
7 2p; 


(Sh? — k,*) a,— kro, _ hy? x), a0 
) 


wh? 


For the precision k;’ we have 


THE LAW OF ERROR 121 


‘er 1 
(k;)* ke? ai, 


Theorem 14] Ifa set of measurements follow the law of Gauss 
with the respective precisions k,, ky, ... kn, the ath 
residual follows that law with the precision k,’ where 
1 1 1 
Sy a 20 
(ki)’ hye Bk? ae 
dj 


a. 


Theorem 15] The precision of a residual of n observations of 


(21) 


precision k is rags ( a ip 


To find k’ replace (n—1) by n in (6). 
For convenient reference, let us make a table of the results 


of (4), (5), (6), (16), and (17). 
TABLE. 
no. Dine 


Given 1 observations of weights p,, 7,, 
Let the corresponding residuals be €,, €), ... €n: 


Obs. weight P.. Weighted mean. 


Standard obs. Average of n. 
a (n—1) =p. 
n—1 n(n—1) |=); I a; f 
236 236. 2S p.e: 2p; 6 
ji j¢ je j 
Se2 Ze Sper =p,e.4 
Mean error Ai me al ihe ai ee i ee 
n—1 n(n—1) (n—1)p, (n—1) 2p; 
ze? Se? =p, «? =P; «? 
Probable error 6745A] 2 0.6745 Pee 6 71b A oo 10.6 74 A eee 
n—1 n(n—1) (n—1)p, (n-1) =p; 
j 
Average error 0.798 Ai ee 0.798 ra re 0-798 “| ee oe 0.798 A i fe 
2 n—1 n(n—1) (n—1)p, (n—1) 2p; 


When it comes to making these various summations, there 
are one or two simple expedients which will materially lighten 


122 ERRORS OF OBSERVATION 


the labour of a computer not provided with an adding 
machine. The latter is practically indispensable when the 
mass of data is large. Let the observed values, as usual, be 
X,&,... @,. Arrange these in order of magnitude. Let 
the weighted mean be 


Pi Xt Poy t+ ++» FPn&n 
Py t+ Pot. +DPn 


Choose any convenient number 2, either the numerically 
smallest, or the median, or any that may seem helpful. 


= iy 


€j = , — & = (x; — 5) + (%— 2), 
ee = (x; —x)* = (x; — 2)” +2 (x; — 2p) (2,—) + (a)— ny; 
2p; (x; — 2p) 
Cae (22) 
= Dye Xp; (%;—)? 
ee ee ee ea (23) 

2B Pj ‘ 

Let the reader prove the following formula, of use later, 
2p; (%j;—2) (yY;—¥Y) XP; (%;—%) (Y; — Yo) 
t 5 = ——_,—__—___ — (a, — 2) (y)—9). 

J 


These devices are particularly useful when the observations 
and weights are integers, but the weighted mean is not. 

| The labour of calculation may be further reduced as 
follows. Let us first recall Tchebycheff’s principle whereby 
an average will probably be close to its mean value. Then 
the expression Xp; lé;| 


2p; V¥(n—1) 
will be close to the ASRS error, and the expression 


as ~ "0-798 2 p;v¥(n—1) 
“] 


will be close to the probable error. Replacing the unknown 
§,’3 by the residuals we get: 


THE LAW OF ERROR 123 


Approximate formula for probable error of weighted mean 
0-845. + (24) 


The other probable errors may be easily calculated from this. 

The approximate precision will be found from the equation 
' — 1-77 PG l ¢; | : 
K ¥n—1 3p; 


When all of the measurements have the same weight, the 
precision of the average will be given by 


(25) 


rele (26) 

nvn—1 

The precision of a single observation will be £, where 

are by fy fe 3 |¢ plea 27. 
Ga (27) 

“1 vn(n—1)’ 

and the precision of a residual will be k’ where 

ae 28 
a Oe (28) 


The way to test in practice whether a series of observations 
conform to the Gauss law is as follows, Calculate the pre- 
cision of a residual by the general formula in the table, and 
(21), or by approximate formula (28). The number of 
observations having a residual numerically not above ¢ 
should be close to 


n® (ke). (29) 


As a quick check, note whether nearly one-half the measures 
have a residual not greater than the probable error. Here is 
an example.* 


Example] In the years 1904 und 1905, 104 tests were made 
of the atomic weight of iodine in the Chemical Labora- 


* I owe this example to Messrs. William Eldredge and Denning Miller. 


124, ERRORS OF OBSERVATION 


tory of Harvard University. Taking as a first 
approximation x, = 126-980 and multiplying the 
residuals by 1000 we get 


8 


10% (w—2y) 10°(a—x) 10° (a—z)? 108 (a@—a) 108(a—%) 10°(4—x)? 
13 11 121 8 1 1 
13 11 121 3 1 1 
11 9 81 2 0 0 
11 9 81 uk 1 1 
10 8 64 0 2 4 
10 8 64 0 2 4 

9 q 49 0 2 4 
8 6 36 —1 8 9 
8 6 36 =i 33 9 
8 6 36 -1 38 9 
7 5 25 —: 5 25 
7 5 25 —? 5 25 
7 5 25 -3 5 25 
uf 5 25 —3 5 25 
6 4 16, —4 6 86 
6 4 16 —4 6 86 
6 4 16 —5 7 49 
5 3 yg —5 vi 49 
5 8 9 —6 8 64 
5 3 9 —7 9 81 
5 8 9 —7 9 81 
5 8 9 -7 9 81 
4 2 4 -—8 10 100 
4 2 4 —11 13 169 
3 1 1 =f 13 169 
3 1 1 =18 15 225 
206 134 892 —94 150 1,282 
wa = 126.982. 
7 = 0.0092 [by table]. = = 0.0096 [by (28)}. 
Errors less than Observed. Calculated. 
0-001 6 6:3 
0.002 1H 12.5 
8.008 19 18-8 
0.004 22 24.5 
0.005 30 29 
0.006 35 88.4 


The discrepancy is never greater than 4 per cent., generally 
less. On the other hand, the residuals are not symmetrically 
distributed above and below. 


Problem. 
Work out a similar table. 


DOUBTFUL OBSERVATIONS 125 


q$ 3. Doubtful Observations. 


It will frequently occur that when a large number of 
measurements have been taken of the same quantity, there 
will be one or more that differ very sharply from all of the 
others. These observations create a strong suspicion that in 
their cases there were additional causes of disturbance at 
work that did not apply in the case of the other measure- 
ments, and that, in consequence, these exceptional values 
should be rejected in making a ealculation of the probable 
error or precision. This question, as we shall see, is ex- 
ceedingly delicate, but it is insistent, and there can be no 
doubt that many observers reject some of their observations 
by pure guess-work or common sense. 

Bertrand has pointed out by an ingenious analysis * that if 
we assume all of our measures to be equally trustworthy, 
and reject the worst ones, we shall decrease the probable 
error of the weighted mean. The reasoning is as follows: 

Suppose that we reject those observations whose errors are 
so large numerically that the chance is less than 1—y of 
committing them. We have as a limit of error 


2 fe Padt = @ (kA 

SS é = CA). 30 
pe al (kX) (30) 
Let €,, & & «+» &m be the errors of the observations 
X,, Ly, ... Ly» Which are retained. Assuming all measures of 
equal weight, let us find the square of the mean error of our 

new average LH ay ts. 

mv 


This will not be 1/2mk? as the reader might suppose, for 
some observations, the worst, have been rejected, but will be 


1 
-- value of £7], 
za [mean value of £7] 


when we mean by mean value of £?, the mean value under 
the present circumstances when the worst measurements have 
been rejected. 

If we examine statistically into the probability that an 


* Bertrand, loc. cit., p, 211. 


126 ERRORS OF OBSERVATION 


error shall take a particular value numerically less than X, 
that probability will be greater than it was before, for the 
numerator is the same, but the denominator has been reduced 
as the errors numerically above » have been rejected. We 
shall have, approximately, m = np. 

The mean value of the square of an error less than A 
numerically will now be 


=>. | fe de. 


Integrating by parts, and remembering (30), we have 
atl ote Ah eee e 
ava Oke ),+ +/ pvt “2h “apr ue 


haba 1 2heT EM 
= sal pvr | 


Dividing by ™, or rather np, we have for the square of the 
mean error of the new mean 
ii rh 


i fie (kA) — 
2 nk? [0 (kA)? 

This is less than the square of the mean error of the old 
mean by the second factor. Unfortunately we have no one 
to tell us which observations we ought certainly to reject. 

A more natural proceeding is to assume that in a few cases 
there has been at work a disturbing cause, not usually present. 
The first writer to attack the question from this point of 
view was Benjamin Peirce.* He set himself the following 
general problem. Given NV observations, and a proposed 
number to be rejected n, what is the numerical limit of error 
that makes it more likely that the m observations whose 
residuals exceed this arose from a disturbing cause than from 
the operation of the natural laws at work in the other 
cases? Peirce’s solution is highly attractive. He frames two 
hypotheses, first that there was no disturbing cause, second 
that there was one. For the first hypothesis he calculates 


oa ea hé 
a 


* Peirce, ‘Criterion for the Rejection of Doubtful Observations’, Astrono- 
mical Journal, vol. ii, 1852. 


DOUBTFUL OBSERVATIONS | 127 


the probability that » observations should give errors as 
large as the suspicious ones, and that the other observations 
should give just the errors committed, multiplying the two 
probabilities together. For the second hypothesis he rejects 
these observations in toto, recalculates his precision, and the 
probability of making just the other errors. This he multiplies 
by the a prioré probability that n observations should be 
disturbed and the others should not. 

Peirce’s paper aroused a good deal of discussion. It was 
attacked by Airy * on the ground that no judgement should 
be made as to errors a posteriori, but to this Winlock + 
truthfully replied that the whole theory of errors was based 
on just this ground. Other criticisms have been made, but 
the real fault was never laid bare until many years later 
when Stewart ¢ showed the absurdity of starting with a totally 
unknown a priord probability, and calculating it by assuming 
that it took its maximum. 

A simpler rule than Peirce’s was devised by Chauvenet.§ 
The number of observations being J, if the probability of an 
error numerically greater than e be p, then Np will be about 
the number of errors numerically above e. If we set this 
equal to $, and calculate e, we are unlikely to havo an error 
as large as that numerically, and larger errors should be 
rejected. There are various possible objections to this, one 
obvious one being that the calculus of probability deals with 
ratios, not with actual numbers. The number of errors of 
a given size will not be Np, but Np+d, where d is an 
unknown number, small compared with J. 

A totally-different method of attack was devised by Stone.|| 
His idea was that each observer erred grossly in a certain 
proportion of his observations. If the probability that the 
error of an observation should be as large as a certain number 
be less than the probability that one of the NV observation 


* Airy, ‘Remarks on Peirce’s Criterion’, Astronomical Journal, vol. iv, 1856. 

+ Winlock, ‘ Airy’s Objections to Peirce’s Criterion’, ibid. 

t Stewart, ‘ Peirce’s Criterion ’, Popular Astronomy, vol. xxviii, 1920. 

§ Chauvenet, Astronomy, vol. i, 1863, p. 558. 

|| Stone, ‘ Rejection of Discordant Observations’, Monthly Notices R. Astr. Soc., 
vol. xxviii, 1868, xxxiv, 1874, and xxxv, 1875. 


128 ERRORS OF OBSERVATION 


should be affected by the observer’s personal idiosyncrasy, 
the observation should be rejected. There are.two convincing 
objections to this method of procedure. One is that we have 
no exact knowledge of just how often an individual will 
err in this way. The other is that, after we have calculated 
the limit of acceptable observations for a series of N, and 
find that perhaps one observation should be rejected, we 
might, instead of rejecting this observation, keep on, and 
observe V more times, with no worse result. On the basis of 
the 2.V the observation which was suspicious before, may now 
be acceptable. 

Stone’s proposal led him into rather an unedifying dispute 
with Glaisher, who proposed a method of his own.* His 
idea was to weight the various observations, deducing their 
weights by a method of successive approximations. Start 
in the usual way, and calculate the precision. Assuming the 
Gaussian law of error, this enables us to calculate the re- 
spective probabilities that the given series resulted from 
a true value equal to the first, the second ... the last of the 
given values. We next give to each observation a weight 
proportional to the square of the corresponding probability, 
tind the new weighted mean and corresponding precision, and 
begin over again. Glaisher assumes that eventually this 
process will approach to a definite limit. It might well be 
very long. Moreover, it seems to involve a certain petitio 
principtt. For the weight attached to an observation is 
proportional to the square of the probability that the series 
arose from this true value when all the observations are 
equally trustworthy, and is a meaningless coefficient if they 
be otherwise. 

A number of critics have maintained that, a priori, it is 
quite inadmissible to reject any one of a set of observations 
when all are carried out with the same care. Our own view 
is that such caution is excessive. It is all a question in the 
probability of causes. Here is an observation, far away from 
the mean of the others. It may have arisen from the same 


* Glaisher, ‘On the Rejection of Discordant Observations’, ibid., xxxiii 
and xxxiv. 


DOUBTFUL OBSERVATIONS 129 


causes which were operative in the other cases, there may 
have been a disturbing cause. Let 7, be the a priord prob- 
ability that all was as usual when this observation was 
taken, 7, the a priort probability that there was a disturbing 
element, tending to favour this result. We do not know the 
value of either of these, but may safely assume that the first 
is considerably the larger. Let p, be the probability that 
the particular measurement would be made in the natural 
course, , that the special disturbing element might produce 
it. This latter we do not know, but may assume it large. 
p, we can calculate. To compare the two hypotheses by 
Bayes’ principle we must look at the fraction 


T3Po 

If p, be infinitesimally small, in spite of the likelihood that 
m, is considerably larger than 7,, there is much reason to 
suspect that the fraction is small, and the observation should 
be rejected. It is the same principle we discussed in Ch. VI, 
p. 95, in discussing the lawsuit over a roulette wheel. 

The delicate point is the probability p,. The safest plan 
is to calculate 1—yp,, the probability that no one of the n 
observations should vary so widely as the most suspicious 
observation made. If this be as large as a fixed large 
probability P, there will be strong grounds for the belief that 
the worst observation did not arise in the natural course, and 
that it should, consequently, be rejected. Analytically, the 
probability that no error will be numerically above re, where 
e is the probable error, is 

[© (0-4769)|" = p. 

Given P= 099; nm = 30. 

We get ley 

A residual 5 times the probable error is, here, suspicious. 


2686 K 


CHAPTER VIII 


ERRORS IN MANY VARIABLES 


§ 1. The Law of Error.* 


In all of the work done so far, we have tacitly assumed 
that we were studying errors in the observation of a single 
variable quantity. There are, however, cases where it is 
interesting and important to observe groups of quantities, 
and the corresponding groups of errors, in other words, error 
in measurements involving many independent variables. Our 
present task is to establish a plausible rule for the distribution 
of accidental errors in such cases. 

We must say, by way of preface, a word or two on the 
matter of notation. The strictly scientific method would be 
to use a system of double subscripts, the one to indicate the 
quantity, the other the observation. The resulting formulae 
would be compact, but would lack clearness. We assume, 
therefore, that we have n sets of measurements of m inde- 
pendent variables 


(X15 Yrs 219 +++) (es Yoo 2s +s) 1+ (Wns Yns Sno oo) 


The true values shall be X, Y, Z,..... The true errors shall be 
(,, mG» 1) (&2, Nos 629 vee) tee (Fe i Pe cael 


Assumption 1] The mean value of an individual accidental 
error is zero. 


This is certainly plausible, for a contrary assumption would 


* The present section, in so far as it deals with any number of variables, 
is taken direct from an article by the Author, ‘The Gaussian Law of 
Error for Any Number of Variables’, Transactions American Math. Soc., 1928, 
Apparently the only other treatment is that of Von Mises, ‘Fundamental- 
sitze der Wahrscheinlichkeitsrechnung’, Math. Zeitschrift, vol. iv, 1919, and 
‘Grundlagen der W.’, ibid., vol. v, 1920. See also Dodd, ‘Functions of 
Measurements’, Sartryck wr Skandinavisk Aktuarietidskrift, Upsala, 1922, 


THE LAW OF ERROR 131 


involve a tendency towards a positive or negative error, 
which should be classed with the constant errors. 


Postulate 1] The Postulates 1]-5| for the best value of a 
single observed quantity hold for each quantity of the 
group. 

We have for each of our quantities, exactly the assumptions 


for one quantity which were set up in the last chapter. We 
may thus write our best values in the form: 


SPis%j ge _ AUGYs gs _ MG 
4 = A = ° 1 
Sot eee a, 0) 


tc 


Theorem 1] When a set of observations are made under the 
conditions of Assumptions 1-3, the best value for each 
quantity is a weighted mean. 


Theorem 2] When all observations of one quantity are equally 
trustworthy, the best value is their average. 


We shall, in future, use the words weighted mean in place 
of best value, the coefficients being the weights. 


Theorem 3] If 2 be possible to express each measurement as 
an average of a certain number of standard observa- 
tions, then the weights in the weighted mean are 
proportional to the numbers of standard observations 
in each case. 


Theorem 4] If the mean error for the observation x, be 1/k,v 2, 
the mean error for the weighted mean will be 


(SD: \2 
i 2k 


Theorem 5] The weights in the weighted mean are inversely 


proportional to the squares of the corresponding mean 
errors, 


Let the residuals corresponding to the true errors &, 7, 
670, © tax 


ee 
iv) 


K 2 


132 ERRORS IN MANY VARIABLES 


Assumption 2] When the number of observations is large, the 
mean value of each of the expressions such as 


SPi8?  Apie? A DsSi es (3) 
yy ay) 


may be replaced by its observed value. 


Theorem 6] The meanerrors of the weighted means &, G, z... are 


Nese pd? a =f 2 Pe 27:5? 
2K?~ af (n—1) 3p,’ N2BP VN @—1) ap 
respectively. 


Theorem 7] The mean errors of the individual observations 


Be, Uy ee OT 


Je end ee Pen 
2k, (n—1) p;’ al; os (n—1)p, 


Theorem 8] When «a; 18 the average of p; standard observations, 
the mean error of one of these is 


| ae } Dyer . 
2h? ei 


We must now try to develop a law of error for our groups 
of observations. For the sake of simplicity, we shall assume 
all groups are equally trustworthy, so that all are weighted 
alike. 


Assumption 3] The a priori probability that a group of 
quantities to be measured should take values in the 
infinitesimal region 

X+4dX; Y+idY;-7+4azZ..., 
where the point X, Y, Z,... lies in a continuous m 
dimensional region S, will differ by an infinitesimal 
of higher order from 
TA Age Ny 2, en) eek Celta 


where the function f is continuous, with continuous 
first derwatives in 8S. 


THE LAW OF ERROR 133 


Assumption 4] The probability that a group of quantities 
whose true values are X, Y, Z,... in 8, should be 
observed, after the removal of constant errors, to have 
values in the infinitesimal region of S 

a+gda, ytpdy, 2+3dz..., 
will differ by an infinitesimal of higher order from 
C(t Y, & ox) de dy de... 
where © is a function continuous in all of its argu- 


ments, and with continuous first and second partial 
derwatives, and is independent of the origin. 


Assumption 5] If the injinitesimal increments be sufficiently 
small, the probability that the true values lie in the 
infinitesimal region 

E+ZIX, YtRdY, Z+5d7Z... 
is greater than that they lie in any other such region. 
We have now a sufficient number of assumptions to determine 
the form of our function. The fact that f is independent 
of the origin, enables us to write 
DICK VL, s dy yy 2s aos) == PGs Ts Go ee) 


Let us further write 


(Eis Nis Cio) = He 
The probability that the observations were made on a group 
with the true values X, Y, Z,... will be 
f (X,Y, Z, .+-) by by oss by AXAY AZ... 


5 4) 
sa [FX OTA ACH AORN GIA ( 


This will be a maximum with the logarithm of the numera- 
tor. Taking the partial derivatives 


1 of  dlog¢, dlog dy 

a + coo t eel = 0, 
f OX 28; En 
1 of MS 4, 2108 bn 
fait One 4 oY ie 


cy (5) 


134 ERRORS IN MANY VARIABLES 


Since f is independent of the observations, in the particular 
case where each set is exactly right 


Teaeye dlogd _. 
eel ay VY +7 dE ==). 
1 of dlog 
FY iF = 0, (6) 
Re ee of 
Hence S¥ —3P 37> 0, 
Ff = const 
1 2d, , 1d, 1 py _ 
— I+ 44+ — = 0, 
o£, $2 2%, on En 
12d, 12d.) 4 1 2bn_ og a 


$; on, p, oN, a Pn oNn _ 


These equations exist when 
fe OG Bee N Ge Re Bsa 
Die + Polat. + Pata = 9; 
MM +42N2t ++» + Inn = 9, 
Mert Velo bee + Ta ln = 0. 


Now let x,, %,, ... &, take infinitesimal increments, subject 
to (1). We have 


aa ait Tab =| fea et se les Se Pen=o 
se Lasan d+ 3g Lasse) ++ ae Lane Pom? 
DM, UE, + podbot. + pntén = 0. 


Since those which precede the last must hold whenever 
the last does, 


12 [ 264 
——|— —!|/=a 1 =1, 2,... 7), 
pi Es Lbs OF; 


THE LAW OF ERROR 135 


PE LO 1 p; MLO; J 


Slog _ 

eT Sie aétbnt..., 

dlog> pap 

ane essa PAP do08 

ri — Cale (7; Gye). (8) 


Here y? is a quadratic function, necessarily homogeneous, 
for we have already seen that the partial derivatives vanish 
when the arguments are all zero. Moreover, its discriminant 
is not zero, for if it were, the partial derivatives would be 
linearly dependent, and vanish for an infinite number of real 
sets of the variables, and this is in direct conflict with our 
assumption that the maximum arises only from taking all the 
values equal to zero. Moreover, since this is known to be 
a maximum, this form must be definite and positive, since 
otherwise the maximum would be attained at infinity: 

The homogeneous quadratic form w is definite and positive 
with a non-vanishing discriminant. 

We next notice that all the work done so far has been in 
a certain region S. We have found the probability that an 
observation in S should lie in a certain infinitesimal sub- 
region. What is the region S? It could not be the whole 
of space, as the assumption that f is everywhere a constant 
will lead to the absurd conclusion 


is | f(XYZ...)AXdY az... = ih 


On further consideration two more facts appear. First, it 
seems plausible to assume that f is constant throughout 
a certain region, and drops away very rapidly outside of it. 
Second, the expression (8) is extremely small] outside of a very 
restricted part of space. As this will produce a result like 
that produced by the disappearance of /, the error in calcu- 
lating the constants will be small if we allow S to extend 
throughout all space. 


136 ERRORS IN MANY VARIABLES 


Assumption 6] For the sake of calculating constants, formula 
(8) may be assumed true everywhere. 


We have also at our disposal Assumption 2], and this with 
6] will be enough to solve the problém. As, however, the 
solution is rather long, we shall begin with the case of two 
z variables, and assume that all observations are equally 
trustworthy. Suppose that the law of error is expressed 
by the equation , 

f= Re~ Gb +2 bint cn?) (9) 

The curves a€?+2bén +en*? = const. 


are curves of like probability, and cannot run off to infinity 
by Assumption 2]. Hence these curves must be ellipses and 


b*— ae < 0; 


By a rotation of the €, 7 plane about the origin, these 
ellipses may be written 


a’ € +¢'n” = const. 
The theory of invariants for conics shows us that 
atc=a’'+c; ®—ac=—a’e’. (10) 


Since the sum of all probabilities is unity 


1 =e ag] e GE + 2bénten’) a 


mele ii on de! [ er dnt 
V(a'c’) — A (ac—b?) 
Tv 


T 


y= 


We find, by Assumption 2], 
V (ac — b*) te @ ag| —(ag?+2 bén ten?) dine x02 


7 n—1 
Bee ac 


—o 


In | emia tBdEntent) ge _ SE 
e ny ht ae 
ae ae gde{ Pam las isee see — 286 


ee ah 


THE LAW OF ERROR 137 


To solve the first of these, we write 
: 2 2o—6' ,, b — 2 
SA UTS Dae eer ett ven) , 
sf = 
ys 
Mh Vee cn, 


0(£’, 7’) =e d(£, n) i 2) 
(, 7) (8,4) Ve 
Changing variables, we have 
Whale =O C te a 
=e ) Rei ¢ Mag| e 1” dn’ 
WAC =D? pre ee 
= ee dé, 
“ c 
~ 2 (ac—b) 
Similarly 
WAGED) ee 0 ata abe en)! a 
| aida oe id ee year 


There remains 
sok 2 fo 0) feo) 
gale b Nes édg| en At 2VED+ ET) in 


Putting, as before, 


f= 

. b 

Wf = 76+ Ven, 
ce RN 

iar oan 


Since | nie" d i e=20: 


138 ERRORS IN MANY VARIABLES 


We have 


mA (ac—b af ée -(<— “)e iN epee es 


eV (cm) (c7) at 


c yd2 
2 (ac— 0%) wi’ 
a Hsp xe,’ 
2(ac—b?) m—1’ 
=O an > O;€; 
2(ac—b*)  n—1’ 
c:0:b6= 302: de,7: —Wd;e;. 
de,;? (n—1) 
2[ R07 FoF (BOE) 
es — 3 6;e; (n—1) 
2( 362 FP —(Fd;€.)*]’ 
3 6,? (n—1) 
2 [367 Se,;*— (3 0;¢;)' ae 
a eee 
2m (32 Ve2—(TS,e,))2 
p= Re~ et 2bEr+ er?) (9) 


a= 


(11) 


c= 


q The problem of finding the actual ‘coefficients in the 
general case * does not seem to lend itself to an analogous 
method. We therefore take up the question from the start ; 
the theoretical importance of the problem seems sufficient to 
warrant the labour involved. We shall begin by a change of 
notation in order to use forms which are frequent in the 
study of linear transformations. Let us write 


ES ty, 7 Boy C= Uy ave 
so that (8) becomes 


o = Re ae Dig PPC (12) 


* Cf. Greiner, Zeitschrift fiir Mathematik und Physik, vol. lvii, p. 226; and 
Pearson, Philosophical Transactions, vol, clxxxvii, p. 299 ff, 


THE LAW OF ERROR 139 


We must first consider the discriminant 


| @41% oe eee Ban | = | a, ° 


Since the quadratic form is definite, this discriminant is 
not 0, and we may find such a linear transformation 


k=m 
‘ = , 
@= D cyt, ley | #0, (13) 
R= 1 
4,j,k,l=m 
— Dost 
that D> Uj 22; = > IR AT ha xy 
30] Jun py) i | 


II 
= 
8 
~ ~ 
— 
th 
° 


4,j =m 


Hence i Uys C70; = 9, ae (14) 


ai; Ciy Ce = = b,, 
by by «+ Om = le; 1? | a,; |. 


The inverse of the substitution contragredient to (13) is 


Li 


Uy = > Ci, 4» 


¢=1 


i,j=m 


/ 
Up ; ye CEC iE Vi; 


i,j=1 


II 


The hyperquadric in (n—1) dimensional space 
i,j=n 
» Uys LiL; = 0, 


i,j=1 


has the tangential equation 


140 ERRORS IN MANY VARIABLES 


In the new variables the point equation 


k=m 


ee bya,” = 0) 


ai ry 


corresponds to the tangential equation 


Slik ih 
| a; | > ie = A;,. (15) 
As a second step in our development, let us consider the 
residuals 
Ou Ore, eke heat Oo O95 ooo One sees Gans One, wee Same 


We write tor brevity 


k=a 
> onidrj 
fo (16) 


We get from Assumption 2] 
Py = R| ca xa, * 8%) dary mee (17) 


Let us change variables, remembering that the Jacobian 
is |e; |, 


kyl=m -= bp xt,? x 


ioe) co 
n ae , fe E ae ree Ds 
Pij = R\c; \| =| eS CER CIE ue! da, eee AL . 
= aa k,tsxl 
Now, when k + J, 
. by ay’? m by? 
| ay” eek day’ | ae "day = 0. 
—oo —2 


Hence 
o k=m 


@ 
ans ae 19 —b,.x,/2 , 
py = Rloy|| dn ee 


ae 


THE LAW OF ERROR 141 


We know, further, that 


Vd,’ 
J myte Edo ST 
ae 2 (b;,)? 
R egy Beles is 
Hence Dij = |e; | ; : nest 
DiOp...Ug)* pan. Ck 
Hence, by (15) 
|c; | 7? 
py = R44 A 
2 | ay; | (b,b, Vin)? ey 
Rr2 A; * 
= ai. 
2 | Cy |2 


Furthermore, 


[oo : CO 
— S4..4%.x. 
1= RI al Ct Stine. Ok ae 
—0o —0 
wm 


_ Rat | cag | 
© (Dy ose Dmg)? 
a Ras 
Vl ayy | 
Dividing out Py = a i 


7) 
Here the quantities p;; are known. We wish to solve for 


the unknown a,,’s. We first write 


i; = 3 | Pi | 

"  Opy 

Since the process of interchanging each element of a non- 

vanishing determinant with its cofactor is an involutory one, 

except for multiplication by a positive or negative power of 
the determinant, we shall evidently have 


142 ERRORS IN MANY VARIABLES 


ay = ME, 
[ay | = M™| p,; |", 
| Ds; | = lA Be es, 
2m (ag, | ag 
sa = |p |”, 
— 1 . 
aie 


We thus reach our final equations 


gp = RoW? ui), 


1 dlog| p,; | 
Cae 18 
yy 2 Pi; ( ) 
k=n 
= O15 Oj 
a 2 n—1 


§ 2. The Error Ellipse. 


Let us return to a more careful study of the case where 
there are but two quantities in a group. The curves 
ag? + 2bén+en? = a’ é? +0'n”? = H (19) 
are ellipses, and ‘are called error ellipses or ellipses of equal 
probability. The meaning of the designation is easily seen. 
If we take a small band on either side of such an ellipse, the 
probability that the point representing a pair of values should 
lie inasmall region of this band is independent of the position 
of the region with regard to the curve; the points should be 
somewhat uniformly distributed throughout the band. To 


study this ellipse, we must return to our equations of trans- 
formation bac = —a’e’, 
atc=a'+c, 
the relation between the two sets of variables is 
é’ = écos6+7 sin 8, 
n’ = —£s8ind+7 cos 6, 


THE .ERROR ELLIPSE 143 


2b 
tan 26 = — ——.,, 
C—a 
at 
cf = +bese 2d, (20) 
ate _ 
C= + bese 26, 


the semi-axes of the ellipse are 
(H/a’)? (H/e’)*. 
ue eR aie: Ee 
Vac’ /(ac= 6?) 


The probability of being between adjacent ellipses is 


Its area is 


e df. 
To find K, we have 


Kr ier 5 
eal é aH =1, 


- v7 (ac —b?) 


Tv 


K 


The probability of being in a small band is 
e dH. 
The probability of falling outside an ellipse H, is 


ie eA"dH =e", 
My 


About one-half of the points should be without the ellipse 


(ome ae 


H, = 0-6935. 
This is called the ‘ probable ellipse’. Its area is 
0:69357_ 
Vv (uc — 6’) 
In judging the performance of marksmen, it has been 
suggested that they should be graded according to the small- 
ness of their error ellipses, i.e. the better marksman is the 


144 ERRORS IN MANY VARIABLES 


one for whom ac—6? has the larger value. In extreme cases 
this method may work badly.* 

As an example of how such material may be handled, we 
take a case that has perhaps more historical than mathematical 
interes, the ‘Big Bertha’ shots that fell on Paris in 1918. 
The number of shots is not very large, we take 100, which is 
nearly the total, and they were not all fired from the same 


Fie. 4, 


spot. The major axis of the probable ellipse,'as shown in 
Fig. 4, does point in a general way in the direction whence 
the firing came. This, of course, we should expect under any 
circumstances. Errors in range are likely to show more 
variation than errors in direction, and a set of shots which 
took a circular distribution on a point-blank target would 
take an elliptical one on a target not perpendicular to the 


* Cf. Bertrand, loc. cit., pp. 286 ff. 


THE ERROR ELLIPSE 145 


central curve. The details of the calculation used in finding 
the probable ellipse are 


i= 100) 
x 6;? Dore de? 
= == oye ——— —— 
270, = 1,180, = 2,030, 
a 203 —118 327 
SA SS Se SS (OND SS 
1,049,140’ 1,049,140’ 1,049,140 


a= 0:000193, b= —0-000113, c = 0:000312, 
tan 26 = +1:9, @= 31°, 
w’ = 0:000124, c’ = 0:000380. 


The axes of the probable ellipse are 
a=75, B= 43. 


In the figure there are only 45 points within the probable 
ellipse, but had it been just a little larger, fully one-half 
would have been therein. The centre is close to the Louvre, 
and the major axis passes close to the Gare de ]’Est. 


§ 3. The Correlation Coefficient. 


Until recently, the only interest attached to errors in two 
variables was the ballistic one, but now new applications 
have arisen in connexion with statistics. A fundamental 
que‘tion in many sorts of statistical work, especially in 
soci logical and biological sciences, is whether two character- 
istics which are noted in a large number of individuals are 
connrcted in some way, or vary independently. It is evident 
that the assimilation of such measurements and variations to 
accidental errors of observation is very crude. The errors 
here are committed by Nature as she varies one way or the 
other from the average. Nevertheless, in a good many cases, 
our Assumptions 1] to 6] do-fit her methods of operating 
with considerable closeness. 

Suppose, then, that in the case of a number of individuals, 
we measure the same pair of characteristics, and plot the 

2686 L 


146 ERRORS IN MANY VARIABLES 


pairs of measurements as points in a plane. In order to 
bring our notation into conformity with that used by statis- 
ticians, we shall call the residuals 7,y,, ©Y2, ... U,Yn- If 
the two characteristics were so connécted that the one in- 
creased above the mean proportionately to the increase of the 
other, the points plotted would lie on a line of positive slope; 
if the increase of one were proportional to the decrease of 
the other, the line would have a negative slope. If the 
characteristics were completely independent of one another, 
the mean value of their product would be 0, and the axes of 
the ellipse would be the axes of # and y. 


We write in the usual statistical notation 


x2, XY; 
nen 


The number LY 


Se ee 
~ Via vy o,0, oh 


is called the correlation coefficient. We suppose 1 so large 
that we may safely put 


Under these circumstances, we have from (11) and (20), 


2 
Ss vy 
a= 3 
20, oy (l=) 20,2 (1—r?)’ 
—r 
—_ 
20,0, (1—1)’ 
ae 1 
2G leat)» 
270.0, 
tan 26 = —_,—#, 
Tx —Ty 
2 es __ pt) 
ose 26 (oy +o,”) iad oy (1 ? yo 


THE CORRELATION COEFFICIENT 147 


‘ ave 1 
2 G, +oy +((t,+0,")°—4o,20,°(1 —r*))? 


ul 
40,7, (l=?) . 


U 


40,7 ay! (1—r?) : 


oe > ae P 1 
oe e On tO, + ((o,.7 fe a P= Rul ra —1?))? 


<5 — 40,70," (1-—r?). 

When 7 is close to 1 or to —1 we say that we have 
a strong positive or negative correlation. The difference 
between a’ and c’ will be large, and the ratio of the axes will 
be close to 0 or ©. The ellipse will be excessively flat, and 
the two sets of residuals will tend to vary proportionately. 
On the other hand, when r is close to 0 we say the correlation 
is weak. 6 will be very small. Then either 6 will be very 
small, and the probability function will be close to 


—(ax*+cy) —ax? —by? 
e =e ie Bs 


x 


which is characteristic of independent variation, or else o, is 
nearly equal to a,, a’ is nearly equal to c’, and we have 
nearly a circular distribution which would also give 6b 
close to 0. : 

q It is fair to say that the usual method of arriving at the 
correlation coefficient is quite different from this ; for the sake 
of completeness we sketch the customary proceeding.* 

We start, as before, with the centre of gravity of the given 
points as origin. There may be several a’s corresponding to 
each y. Thus we might have on one horizontal line the 


es (2 5,Yi) (@in Ys) ++» (Lem, Ys)» 
whose centre of gravity would be the point 
u,= — (jy + Wig + ++. Lim,)- 

4a 


If now « and y varied proportionately to one another, all 
of the points (;y;) would be collinear. When they are not, 


* Cf. Yule, ‘On the Significance of Bravais’ Formula’, Proceedings of the 
Royal Society, vol. 1x, 1896. 


i 


148 ERRORS IN MANY VARIABLES 


Jet us find what line does make the best graph of & as 
a function of y. We shall call this a line of regression of x 
on y. We mean by the best, that which will minimize the 
sum of the squares of the weighted divergences of the values 
& from the corresponding values of the function. Calling 
the line a= ky +a, 
we must have 

‘=n 

> m,[#%;- (ky, +a)? = Min. 


iz} 
Diiferentiating partially to @ and k, 
3m; [%;— (ky; +a)] = 0, 


Sm,y;|@;— (ky; + a)] = 0, 


+; 2, —— SM;Y; = Va; ——= (0) 


Here the third summation covers all the abscissae. It must 

not be forgotten that the origin is at the centre of gravity, 
SMBs — ALY 

hence “e=0,k= Sou se oy. 


The line of regression of « on ¥ is 
7 
e=r—y. 
y 
In the same way the line of regression of y on a is 
i oF 24 x, 
Oy 


The tangent of the included angle is 


iarcett 
tan 9 = ——— 
(oe + oy") Sa, 


Tx Oy 


When r is close to 1 or —1, the angle between the two 
lines of regression is close to 0 or 7. Moreover, the sum of 


THE CORRELATION COEFFICIENT 149 


the squares of the areas formed by the various pairs of points 
with the se is 


aL [ay —2y))P = 3 2eP rye —(Ax,y)"), 


| = §0z' 0, (1-2). 

When ¢ is close to 1 or —1, this will be close to 0, so that 
all of the points will lie nearly on a line through the centre 
of gravity. On the other hand, when 7 is close to 0, the two 
lines of regression are close to the axes. For each y, the 
average & is about 0, and the characteristics are practically | 
independent of one another. 

The great trouble at present with the theory of correlation 
seems to be that there is no general agreement as to how 
large r? must be in order that we may safely conclude that 
there is a real connexion between the two sets of phenomena. 

{] Besides the correlation coefficient there is another number, 
called the correlation ratio, which the statisticians sometimes 


employ. We begin with a new system of coordinates which 
is independent of the unit of measure, 


Then PDE PINE (5 


Let us group these according to the y’s as before. We 
shall have on a horizontal line 


(051 Yu.) (®ia! Yi) + Ci , Ye) 2 mM, =n. 
The dispersion of the abscissas is 
v 2 (45 — 
The total dispersion is 


A % (ay — 2)? = V(n- 3 m,h;), 
ij a 


ee eh 


150 ERRORS IN MANY VARIABLES 


This is called the correlation ratio of x’ on y’. It is equal 
to unity when, and only when, y is a single valued function 
of x. If the points %,’ y; all lie on a straight line we have 


a 


Ope ly’, 


3 (mH ;")) _ 
Tr awl (Snag (may?)) meee 
The corresponding points zy must lie in the line 


ad eo 
&=h, y oy” 
which must be the regression of # on y, hence 


hoe! y —=1. 


CHAPTER IX 


INDIRECT OBSERVATIONS 


§ 1. Least Square Method for Combining Indirect 
Observations. 


It frequently happens in physical measurements that we 
are not able to make a direct examination of the quantities 
which interest us, but must deduce their values from the 
observations of certain functions of them. We are faced in 
such cases with the problem of combining the observations 
in such a fashion as will best help us to estimate the values 
of the quantities in which we are interested. 

We first ask a question in pure mathematics. Suppose 
that we observe the values of 1 differentiable functions of m 
variables, what are the values of the variables? There are 
three obvious cases: 

A) n<m. The problem is indeterminate if the functions 
be independent; we can find no unique solution when the 
number of equations is Jess than the number of unknowns. 

B) n=m. The problem is determinate (usually) and 
depends upon our analytical skill in solving the equations. 

C) n>m. If the observed values contained no error 
whatever, some of the equations would result from the others, 
and we should fall back on a previous case. But, owing to 
accidental errors of observation, the system is incompatible 
and the question is, ‘ What do we propose to do about it ?’ 
An easy plan would be to discard some of the equations and 
solve the others, but we have no sure guide as to which 
equations might better be discarded, and we should certainly 
lose some accuracy, just as it is less aecurate to take a single 
measurement of a quantity than to take the average of 
several discordant measurements. 


152 INDIRECT OBSERVATIONS 


In order to solve our problem, we must decide on the 
meaning of the phrase ‘best values for the unknowns’. This 
we shall do as we proceed with our analysis. Suppose that 
we have observed the values of 1 functions, not necessarily 
distinct, of m variables, These observational equations shall be 


Si (Uy, Urs v1 Um) = &, 
Fg (Uy, Uys 0+ Uy) = Le, (1) 


Pye as Abs es eaten tons 
The «’s are observed values. Let us assume that we know 
the weights of our observations, although we do not assume 
that we know the probable errors. We assume also that the 
number of equations is greater than the number of the un- 
knowns, and that the equations are inconsistent, so that we 
cannot discard some and solve the rest. 


Postulate 1] The best values for the wnknowns are those which 
will give a maximum value to the probability of 
obtaining just this series of measurements. 


Assumption 1] The error of each observation follows the 
Gaussian law with a proper precision. 


Assumption 2] We can make first approximations to the 
unknowns so accurate that in correcting the values of 
the f;, corrections above the first order may be neglected. 


Let X; be the true value for the function f;, the true error 
shall be a;— X; = €; and the precision k;. We may follow the 
reasoning of the previous chapter, which shows that all values 
of U,, Ug, ++» Wm ave equally likely. Our problem then is to 
maximize Pad a ate ee) 
and this amounts, in turn, to minimizing 

ke? £4? + hee? £2 +0... +h ton (2) 

If we had taken as one of our assumptions that requiring 
this expression to be a minimum, we might have abandoned 
the assumption that the measures followed the law of Gauss. 


It will be well, in our present work, to use a large assortment 
of symbols. Let us assume that the true values of the 


LEAST SQ. METHOD FOR COMBINING OBSERVATIONS 153 


unknowns are U,, U,,... U,,. These we do not know, and 
never shall know. Let us assume that our good first approxima- 
tions are @,, @:,... ®,, and that their true errors are €,, €,, ... € 
Then we have the true equations 

Fg (@y Hey, Oy + Eg) «0+ Wm + €m) = Xz, 
and by Assumption 2] these may be written 


df. 
Fi (@y, @g; «+ malas =X; = 4;-£;, 
ee) 


m* 


hence the expression to minimize is 
Shel f, ora eo OFA 
nee 4\"1 m. oie do; an 
We shall equate to 0 each of the partial derivatives of this 
with regard to €,, €,... €,- Wedo not know the values of 
the precisions, but we assumed that we knew the weights 
which are proportional to their squares. Hence we have 
m equations 
of; 2 
Epis site €; = 2 pis ( x;—f;)- (3) 
It is well o nae thes at ae using a symbolism which 
is classical in this sort of work. We write, by definition, 


37,8; = [rs], 21;8;t4 = = [ret]. 


Then we have m equations 


E - var +[p oA ce gece” + [exe |e “m 
=[pLe-n] 
[ps 3f at [psh fe ae + [psx {| em 


- lee @-p|, 


154 INDIRECT OBSERVATIONS 


These equations are called the normal equations. The 
principal difficulty is to remember how to write them down. 
An easy way is as follows. We begin with n incompatible 
equations, called residual equations, * 


of, df, of, 
veoat — Eo toe. t oe, Em = %—f, (@ «:. Om); 
. 2 ‘m 
of, ofp fy 
as — a =2- Soc 5 
So, 0 Moet oe Ded, Wy — fi (@).-++ Om) (5) 
ees of 
eae oa Eg tee t 5 Pa Em = n—In (@ +++ Om): 


These equations would become compatible if we replaced 
the quantities x,, 7, ... x, by their true values X,, Xz, ... Xp. 
Multiply the first equation through by p, times the first 
coefficient, the second by p, times the first coefficient, and so 
on. The sum will be the first normal equation. For the kth 
normal equation we use the kth coefficient each time. 

The case which arises most often in practice is that where 
the given functions are linear. Here we do not bother at all 
about the first approximations @,, @2, ... @,, but take as 
corrections the unknowns themselves. It makes for clearness, 
also, to use a variety of letters, rather than to use double 
subscripts. We write the residual equations 


Ay U, +b, Unt... +My Um = 2, 
Ay Uy + byUgt 0.0 + MU = Lo, 
(6) 
Ay + bn, +... FM, Um = Ly: 
The normal equations then are 
[paa]u,+[pab] ug+...+ [pam] um = [paw], 
[pba] u, + [pbb] u,+...+[pbm] u, = [pba], 


[pma] u,+[ pmb] u.t...+[pmm] uw», = [pme]. 
| [paca] [pbb]... [pmm]| = A. (8) 
[pax] [pad]... [pam] 
[pba] [pbb] ... [pbm} 


_ | [pma] [pm]. Lymm} | (9) 


LEAST SQ. METHOD FOR COMBINING OBSERVATIONS 155 


At this point we must consider a troublesome little theoretical 
difficulty which most writers on the present subject calmly 
ignore, namely, the possibility that the denominator might 
be 0. Fortunately this cannot happen. Let us replace the 
normal equations by the true equations 


Vp Uy + Vp, b, U4... + 0pm Ty, = /p, x), 
A ueaes rai +7 p,m, U0, = Vp,X>, 
Tee + VJ Duby U, =P G00 Fp V Dn Mn Uy = SinX ne 


These equations being ae are certainly consistent, and the 
determinant of a set of m of them cannot vanish in every 
case unless there be not enough independent equations to 
determine the U’s. But A is the sum of the squares of these 
m row determinants, hence it cannot vanish. 

As an example of how to work these processes in practice, 
we take a problem in levelling : 

A above O = 573-08 ft. B above A = 2-60 ft. D above 
B= 170-28 ft. B above O = 575-27 ft. C above B = 167-33 
ft. D above C= 3-80 ft. D above # = 425-0 ft. EH above 
O = 319-91 ft. H above O = 319-76 ft. 

We assume that all observations are equally trustworthy, 
and take the weights equal to 1. We have 


Residual equations, Normal equations. 
Uy = 57308 2u,— 4u, = 570:48 
—U, + U, = 260 —U,+4U,— Ug— Uy = 240°26 
—U, +Uy, = 170-28 — Ugt2U,— UW = 163-53 
Up = 575-27 — Uz— Uzt3U%,— U, = 599:08 
— Uy + Us = 167-33 — UW+3u, = 214-66 
—Uzg+Uy4 = 3-80 
U,—U, = 425-0 
Us = 319-91 
Us = 319-75 


Eliminate wu; from the last two, 
—3u,—3U,+ 8uU, = 2011-90. 
Eliminate uw, from the first two, 
7U,—2Uz,—2uU, = 1051. 


156 INDIRECT OBSERVATIONS 


Double the third, 
—2u,+ 4uU,— 2, = 327-06. ° 


Eliminate wu, twice, 


25u,—11u, = 6215-90, 
9u,—6U, = 723-94. 

Dividing by 3, 3U,—2uU, = 241-31. 

Multiply by 6 and subtract from the equation two places 
above, 7 Uy + Uy = 4768-04. 

Double and add to the preceding, 

17u, = 9777-39, 
Uy = 575-14, 

Hence, finally, 

Uy = 572-81, w, = 575-14, ws = 742-05, wy = 745-43, 
WU, = 320-03. 

In this particular case the coefficients in the normal 
equations are unusually simple, and for that reason these 
equations are easily solved. Unfortunately, things do not 
always turn out so pleasantly. We must exhibit the standard 
method to be followed in the usual difficult cases. 

Two remarks are necessary at the outset. The first is that 
it is necessary to provide some check on our work as we 
proceed. The second that the determinant of the coefficients 
in the normal equations-is symmetric. In consequence, if we 
write all that lies above the principal diagonal, we know the 
rest. The method we shall pursue has two characteristics. 
There is a check which is carried along automatically, and 
each time we get a new set of equations with one less variable, 
the determinant of the coefficients is symmetric. — 

We first re-write the residual equations, putting the con- 


Problem. 
The following observations for level were made: 


A above 0 = 115-52. Babove A = 60-12. Babove 0 = 177-04, C above 
A = 234.12. CaboveB = 171.0. EaboveC = 682:25. Eabove D = 211.01. 
D above B = 596:12. D above C = 427.18. 


Find the various differences in level. 


LEAST SQ- METHOD FOR COMBINING OBSERVATIONS 157 


stant on the left and calling it 7;, Then we suppress the 
equality sign and put in a column of numbers s,, 8,, ... 8, each 
of which is the sum of the coefficients and constants in its 
row; this column we ¢all check. 

AU, + b,U, + MyUm Ty 815 


AgUy + b,U, + MyUm Te 82, 


Cn Uy + On Ug t MnUm Tr Sn° 
From these we get the normal equations which we write 
in a similar manner, omitting whatever is below the principal 
diagonal, 


[pac] w+ [pad] y+ [pac] ug+ [pam] Hm [par] [pas] I 
[pbb] u,+[pbe] us +[pbm] um [por] [pbs] 1 
[pee] 5+ [pom] tm [per] [pes] IIT 


[pram] um [me] [pms]. M 
The last column is the check, and the sum of coefficients 
and constants on its row. To check a row not written in 
full, start at the top row and add downwards in any column 
to the diagonal, then to the right. The sum should be the 
check after the Jast term added. We next divide the first 
equation by [pac], 
[pab] [pac] Se [par] [pas] ; 
uk [ pac] ee [ pace] aU [ pu | vm pea] [pau] I 
We now manipulate equations I and I’ as follows. We 
multiply I’ by the coefficient of wu, in I and subtract from II, 
we multiply I’ by the coefficient of w, in I and subtract from 
III, and so on. We get finally a new set of equations with 
the following properties : 
A) wu, has been eliminated, so that there are ees equations 
in as many unknowns. 
B) The determinant of the coefficients is symmetric. 
C) The term in the last column checks the others. 
These equations are identical in form with those numbered 
I, II,...M. We start again and eliminate another variable 
in the same way. 


158 INDIRECT OBSERVATIONS 


As an example, let us try the equations we had before: 


2U,—Uz — 570-48 — 569-48 I 
4U,—Us—Uy  —240-26— 239-26 II 
2u3;—Uu, —163:53—163-53 III 
3u,—U, — 599-08 — 599-08 IV 

3U,— 214-66 — 212-66 V 

U,— Uy — 285-24 — 285-74 I’ 
tu,—Us—U, 525-50 — 524-0 I 
23 — Uy — 163-53 — 163-53 II 
3uU,—U;— 599-08 — 599-08 III 

3u,— 214-66 — 212-66 IV 

Uy —FU,—FuU,— 150-14—149-71 1 
tAu,—2uU, — 313-67 —313-24 I 

42 ,— Us — 749-22 — 748-79 II 

3U,— 214-66 — 212-66 III 

Uz—3U, —182:97—182-72 | 

Zu, —U,— 984-47 —983-72 I 

3U,— 214-66 — 212-66 II 

Uy — FU — 562-55 — 562-13 i 


Mw, = 777-21, 
We thus get 
U, = 572-81; u,= 575-14; u,= 742-06; u, = 745-43; 
Us = 320-03. 


We must further caution the reader not to sit in the seat of 
the scornful, saying that this method turns out ever so much 
more cumbersome than the other. So it does in the present 
case, and so it often will when the coefficients in the normal 
equations are particularly simple. It is wiser in such cases 
to solve by the first method that comes to hand. But when 
the coefficients are complicated, the computer who waits for 
inspiration to find the best way to handle his equations, will 


LEAST S8Q.. METHOD FOR COMBINING OBSERVATIONS 159 


probably be ill inspired. Should the reader be anxious to 
practice this standard proceeding on more complicated 
equations, he will have no difficulty in finding equations that 
will give him all the practice he desires. 

4] We now turn to a question of great theoretical impor- 
tance, the weight to be attached to the solutions of the normal 
equations. This ‘calculation is so difficult that almost all 
text-books omit it. The following development is the easiest 
that we have seen. Let us begin by replacing our incorrect 
residual equations by the true equations 

a,U0,+6,0,+...4+m,Um = Xy, 
Q,U, 0,0, co, Ug = Aa) 
LL tig oll, Me, (10) 
dn U,+b,U,+...+M_U in = Xn- 

We have also a set of true equations, analogous to the 
normal equations 

[paa] U,+[pab] U,+...+ [pam] U, = [paX], 
[pba] U,+[ pbb] U,+...+[pbm] U, = [poX], (11) 


[pma] U, + [pmb] U,+...+ [pm] Up = [pmX] 


Corresponding to the true errors £; of the observed quantities, 
we shall have true errors of the quantities to be computed, 


namely, ny = u,—U;. (12) 
It must be noted that there are n ¢’s but only m7’s. From 
(11) and (9) 
[pau] , + [pad] ny+...+ [pam] nm = [pag], 
[pba], + [pbb] 1. +... + [pom] 1m = [pb], 


ee Re es a hee (13) 
[pma] n, +[pmb] 7. +...+[pmm] nm = [pm] ; 
the solutions are 
oA oA oA 
[pag] 3[paal * [pb €] Nie +[pmg] d[ pa] 
2 = nt orn (1) 


160 INDIRECT OBSERVATIONS 


[ad] spoon + DYE spp ++ Lem) spony | 
Ne G4 gids Ok 6 aE oo ee ae 


We have further, by the elementary theory of determinants, 


i [ pan] steal + [pba] HEA a4 +...4+[pma] sy 


A - (16) 
[pak] G sta] +[pbk] 5 ae +...4+[pmk] 5 1G, = 
= a st 
ie 
oA oA oA 
5 [ pad] sre + [pbb] 3[ pbb] + see an ety pera oe 


A 


oA oA oA 
a Fer eita +[ pbk] d[ pbb] +...4+ [pmk] >| pb] 
= ee ee btk 
(19) 


We have assumed throughout that we knew the weights of 
our observations, but not the corresponding precisions, whose 
squares are proportional to them, 


p; = p 2k. (20) 


Here p isa multiplier to be determined later. The quantities 
n; are linear homogeneous combinations of the true errors £;, 
hence, by VII 12] they follow the Gaussian exponential law. 
We must find their precisions, K;.. This we get by that same 
theorem, namely, 


sef(astn rust ae 
1 wei 9 [pau] * JQ [pba] * m3 X[pma]} 2h, 
2K 2 rt 


LA? 


LEAST SQ. METHOD FOR COMBINING OBSERVATIONS 161 


If we multiply equation (16) by 3 [a oe , and each equation 


>| pau] 


(17) b Bak = Peake add, we find 


oA 
1 ee d[pua] _ dlog A (21) 
oy Ne a) [ puc]” 


Another method of finding this would have been to find 
the mean value of 7,”. If we seek the mean value of 7,7, we 
shall find, using equations (16)-(19), 


dlog A 


Mean value M72 = PS aieenl. (22) 
Lastly, we find similarly 
Mean value 
dlog A dlog A dlog A 5 
= 23 
mé1 a TIPE ap ROM eee peal oe 


Why trouble about these last mean values? They are 
needful to determine p. In the residual equations (6) let us 
put the solutions of the normal equation and add corrections 
6; on the right so that 


A;U, + O,Ug +... FMjUm = X;4+6;. 
We have also 

a; U,+6,U,+...4m,U,, = X;. 
Subtracting «3:9, +0,;n,+... tmjn,,—€; = 9;. 


Let us square each of these equations, multiply by the 
corresponding p;, and take the mean value of the sums, in 
view of (16)-(23). 

The expression [p66], which is an observed quantity, is 
equal to its own mean value. The equivalent expression on 
the left contains only terms of the types (21), (22), and (23) : 


a 


[p88] = > p:62 = (n—m) p. (24) 


Dea 


162 INDIRECT OBSERVATIONS 


Summing up, we get the final expression 


Le -cloea® (pec: _ dlog A [pd] . (25) 
2K d[paa](w—m)’ 2K? , d| pbb] (n—m) 


dlog A [dd] \2 
; == KG . 26 
Probable error of uw, = 0-6745 e oe a (26) 
It is clear that a similar type of calculation could be applied 
in the more general case (5). 


§ 2. Conditioned Observations. 


It sometimes happens that the quantities which we seek 
are not independent, but are connected by certain identical 
relations. The problem is to find the best values for them 
subject to these restrictions. To begin with the most general 
case, suppose that our quantities u,, U2, ... Wm are connected 
by the relation 


Pi (Wiig tigen cain) 
=D es Uo ancas tp) Erm cc Py (ag igs wnat Bigg) nO (27) 


We assume that Assumption 2] may be extended to these 
functions also, so that we may write these equations 


op 
(yy) Sem oo By Oy Oy) FTG = 0. (28) 
Q a 
We have now a problem in relative ane namely, to 
minimize 


=p [file — a+ eal 


i 


k=1 
P) 

= 2 ", [+ (1 +64 @m) + aso | (29) 

The m partial derivatives of this, and the J equations (28) 
will be sufficient to determine the corrections «; and the 
multipliers 7;,. 

A simple case arising frequently in practice is that where 
we observe directly the quantities which we desire to calcu- 


CONDITIONED OBSERVATIONS 163 


late, and where there is but one identical relation. We have 
here a simpler form for (29), namely 


t=m > 
2 es E (01. Om) +255 «| (30) 
PD 
Bs ise 
rl fod 
pt+a5— 5 
; aE) 
eee 
pi, 90; 
Nae ae (31) 
Ser) 


Example 1] The observed values of the angles of a triangle 
are 6,, 4,, 0,; what are the best values? 
> = 0,+ 6,4 6,—7, 
u, = 9;—4(6,+ 6,+6,—7). 
Example 2] The observed sides of a right triangle are a, b,c; 
what are the best values ? 
p = a?+ 62 c*, 


i. =«f1 a*+h?—c? ] * =of1 e+h?— ] 
Ae PO ioe ake a | ae 2 (u2 +b? + 0%) J” 
a2 4b? —¢2 
Lb, = c| 1+ ———_|° 
Ms ef * ey 


§ 3. Curve Fitting. 


An interesting application of the method of least syuares is 
to the problem of finding a curve of given type which will 
best represent a given function. The problem, stated in this 
bald fashion, is evidently indeterminate, until we define the 
term best represent by means of some postulates. Still, 


Problem. 
The sides of a triangle of homogeneous material are measured, and the 
area is determined by weight. Find the best values for the sides. 


M 2 


164 INDIRECT OBSERVATIONS 


there are several standard artifices which can be fairly well 
justified. ’ 

Suppose, first, that we wish to find a power series develop- 
ment for a given function ¢(z). The reader might be 
inclined to answer immediately that this is a very old 
problem indeed; we have merely to take Maclaurin’s series. 
The Maclaurin series is, indeed, absolutely correct if all the 
terms be included. This can never be done in practice. 
When we take merely a finite number of terms, the Maclaurin 
series has merely the property that it is the best possible 
representation of the function very near the origin, i.e. that 
it takes the same value there as does the function, and that if 
m+1 terms be taken the first m derivatives take the same 
value at the origin, whether we differentiate the function or 
the series. This does not by any means show that we could 
not perhaps find another polynomial of the same degree 
which gave a better average representation of the function 
throughout a certain interval. Let us write the general 
polynomial of degree n 

hy +, + Uy h* +... +n &". 

How shall we determine the coefficients so that this shall 
best represent a given function throughout an interval ? 
The most obvious way would be to divide the interval into 
a equal parts and determine the a’s so that at each of the 
m+1 bounding points the function and the polynomial had 
the same values. This would be the method of interpolation. 

A little reflection will now lead us to the idea that it is 
unwise to limit ourselves to m+1 points. Why not take 
a good many more points, so that the number of equations 
will be larger than that of unknowns, and solve by the 
methods developed in the present chapter? Let us take 
the points a), @,, ... @, equally spaced by the interval Az. 

The residual equations are 

Cy +l Ly + UgXy? +... +A," = h (2), 
Ag +0, 4+ 4,27 +...+ a,x," = > (x), 


A Bu a, vy a8 he By” Paes oe Un Ly” = p (a,). 


CURVE FITTING 165 
The normal equations are 
dy 21 + a, Far; tit hy a = 2p (2%); 
Ug Mp + ALP +... FG, RON = 2 U5 (04); 
: (32) 


Uy 20," +a, 50,41 te gb a,20e" (os Sai" (a;). 
i i i é 


The next step is obvious. Why not keep #, and 2, fixed, 
multiply each of these equations by Aw and take the limit 
as Ax—>02 


vn Xn en Ba) 
ao| dx + a,| ede +...4+ a | aoe = d (x) da, 
Lan a Xo 


0 % 
a, [7 Ley ne, Les 
4) xdx + a, | ade +,.: +ay| BOE at ax (x) dla, 
X Xo %9 “XQ 
ca Xn Ln Un 
a,| wdaet+a, | etdet...+ dn | anda =! x” b (x) die. 
42% “19 J Xo vg 


Note that these equations determine ay, (,, ... @, so that 


en 
\ [ep ta, e+... +0,0"— > (x) |? de is a minimuin. 
v9 


. cyl: ; n 
Example] Find the value for sina in the interval -5 to 5 


in the form of a polynomial 
yt, C+ U,2" + 09 + 4,24, 


To begin with, it is well to have this polynomial vanish 
with « Moreover, we should like it, like sin, to be an odd 
function. Hence we write 


yer = And, 
3 5 5 
| aad | ahd = | w sin x da, 
0 0 0 
Tr 
2 


5 3 
| ard | avd = | x sin x da, 
0 0 


0 


166 INDIRECT OBSERVATIONS 


(e+e 
(8+ GY 8-5) -2} 
Orr 


() 4 ee 37? 6 
27 75° 7S 


a; = —0-1450, 
a, =0-9888. 


In the usual Maclaurin series 


a, =—0-1667 a, = 1. 


For z= ee the present series gives 
OF 
sin we 0:991. 


Two terms of Maclaurin’s series will give 


sin ~ = 0-924. 
lar 
There frequently arise cases where we do not wish to 
express a given function in power series, but in some other 
shape. Suppose, for instance, that we wanted a trigonometric 
series for the interval from —z to +7 of the form 


Gy +4, COS % + A, COS 2X+4, 608 3X4... 
+b, sina +b, sin 27+b,sin 3a+.... 
We replace our equations (33) by 


wT T 


[2a cos ka + bj, sin kx] dx = h (x) da. 


—— 


| cos lx [ 3a; cos kx + by, sin ka| dx = | cos lx¢ (x) da, 
k 


—T — 


| sin ma[¥ ay, cos ke +b, sin kx] de = | sin ma¢ (x) da. 
—T k —-T7 


CURVE FITTING 167 
TT 
Now | sin mz sinkx dx = 0. m#k, 
—. 
Tw 
| sin kxcoslxdx = 0, 
—T 


{ cos ka cosladx = 0. eels 


—F. 


| sin? kx dx = 7. KoraO: 


— Ti 


| cos? ka dx = mr. |e Oe 


ith 


J Le 
¢. = =|. p (x) da, 


Cn = | cos ka (x) da, 


b= = | sin kad (w) de. 


All this, however, is nothing in the world but the usual 
determination for the coefficients in a Fourier series, so that 
we reach the interesting result that whereas a finite number 
of terms of a Maclaurin series does not give the best polynomial 
development of a given function over any interval, any 
number of terms of the Fourier series will give, for the 
interval —7 to 7, the best development involving just those 
terms. 

q Let us take a still more general case, and try to represent 
¢ by a function of known type and undetermined coefficients 


WE CORT See Cs AEE 
Following our precedent in the case of a polynomial where 
we had an infinite number of points, we should like to minimize 


a p)? de, 


Problem. 
Calculate cos x in the same way. 


168 INDIRECT OBSERVATIONS 


To do this, we equate to 0 the partial derivatives of this 
integral with regard to the w’s, and throw in the supple- 
inentary condition that the areas under the two curves should 
be the same. We get 


8 


["s (a) da = "9 (a) de, 


| Sell Jadt= [° of h (a) dz. 


ay ou; r, OG 


(34) 


The trouble here is the very prosaic one that, except when 
j is a polynomial, the eliminations are altogether unmanage- 
able. We must seek another method.* If we look closely 
at the equations (33) and inquire into their geometrical 
meaning, we find it to be this. The first 1+1 moments of 
the areas under the curve ¢ and under the polynomial, about 
the y axis, are equal to one another. This suggests the idea 
that, in the general case, we should find the coefficients by 
generalizing the process of equating these moments. We 
thus replace the equations (34) by 


ihe J (e) da = "9 (x) dx, 


VU 


Ly N 
| kf (a)da=| ald (x) da. 
Xo Yay 

These equations will usually be easier to handle in practice. 
It is to be noted also that we pass from (34) to (35) by 
expanding f in Maclaurin’s series. 

4] The problem in practice frequently assumes a different 
form, in that the function ¢ is not given, merely the n+1 
pairs of points (a) Yo), (@Y;) «+» (@mn Yn). Let us imagine that 
these are joined by a broken line which must replace the 
curve ¢~. We are faced with the laborious mathematical 
problem of calculating the moments about the y axis of 
a series of trapezoids standing on the # axis. For simplicity 


* Cf, Karl Pearson, ‘On the Systematic Fitting of Curves ’, Biometrika, vol. i, 
1901-2, and vol, ii, 1902-8. 


CURVE FITTING 169 


we shall assume that the intervals on the a axis are all equal 
so that 


p41 — 0, = 6, 
7, = My + kd. 
For the upper (or lower) side of the (+ 1)th trapezoid 
Y=Yr Gass —Ye (%— ay). 


The mth moment of this prot is 
a +b we 


— [yen a, +6 


Zp4y 
= ae AS gmt. et eS 
b m+1 — 5) + ¥en (ee ml 


= 1 +2 __ m+2__ mt+1 
= Dams ty moa) (Ye Le + BYP —arynt2— 9 (m + 2) By} 


+ Yre+r [ay t? — (ay, + D)™*? +B (m+ 2) (a, + 8)" *"]}. 
We get the total coefficient of y;, by adding the first part of 
this to the last of the preceding term, namely 
1 
b M+2 wag M+2_ Fy M+2 
b (m+ 1) (nv ae 2) L(y, + ) uF (x), ) Lk ] 


ee MeN) eae 


m (m—1)(m—2) 
6! 


ES VARS) ie 4p. |. 


There remain the end y’s, each of which appears in only 
one term. The total coefficient of y, is 


1 ; ; 2 
b (av + 1) (m+ 2) [ (ap + bymt® — ym *?—b (mm + 2) ay™*"] 
NT ™ (m— 1), M-2pz 
= =ab| oe 21 ema eae x b 


Aas ee 5] 
b 7 
aA (m+1) (+2) [ (ep —b) +2 — a m2 4 (in +2) a1), 


170 INDIRECT OBSERVATIONS 


The total coefficient of y,, is 
1 
b(m+1) (m+2) 


= 20[2e 4 MRD, ny 


[ (a, — pym+2_ gy m2 +b (m 2) a m+] 


s 


ar 4! 
A ee eae HO. | 
pa llan tb yt (rm 4 2) 240490) 
(m +1) (m+ 2) U\"n n n ‘ 
Lastly, if we put ees 
Sn PRY 
" k=0 


the total mth moment is 


a 1 1 or sl : 
m!2/ 51 fn + > 8m 2b + B1 fm- a? bed 


Be ee a A +2 a+1 
(m+ 1) (mp2 Yelle 6)Mt2— gmt? +. (am + 2) x" b] 


+Yn [(e, + b)"*?— a, *? — (om + 2} wer SLU (36) 


This unlovely formula probably represents the simplest 

form attainable. It is visibly simpler in the case where 
Yo = Yn = 0. 

The whole subject of curve fitting leads us naturally to 
a topic which has attained a large development, through the 
efforts of the large number of writers on mathematical 
statistics. In England Karl Pearson has founded a whole 
school, and on the Continent the Scandinavians have shown 
themselves particularly skilful in developing the theory. <A 
thorough discussion of all of these new methods will be found 
in Arne Fisher’s Mathematical Theory of Probabilities * 


* 2nd edition, New York, 1922, Parts II ana III. 


“ 


CHAPTER X 
THE STATISTICAL THEORY OF GASES 
§ 1. General Properties of Perfect Gases 


In Chapter V, which dealt with geometrical probabili y, 
we excused ourselves for lingering over such an elegant trifle 
on the ground of the connexion with the kinetie theory of 
gas, and the methods of statistical mechanics. It is now 
time to give a very summary introduction to these extended 
topics.* We shall content ourselves with giving the method 
for deriving Maxwell’s expression for normal distribution, 
and critical comments thereon; it is not our business to 
deduce physical properties. 

A gas, for our present purposes, is conceived as a very 
large agglomeration of very small molecules in rapid motion. 
We consider the case of a gas confined in some vessel, and, as 
a first approximation, make the following assumptions : 

I) The vessel is of finite volume, with perfectly elastic 
walls which are surfaces given by differentiable equations. 

II) All gas molecules are smooth, incompressible, perfectly 
elastic spheres of uniform diameter o and mass, acting under 
the influence of no forces. 

It is evident that, under Newton’s second law, each mole- 
cule will be in a state of rectilinear motion at uniform 
velocity, or at rest, except when its course is altered by 
a collision with a boundary wall or with another molecule. 
We are not concerned with the actual shell of the vessel, but 


* In the present chapter I have leaned very heavily on Castelnuovo, loc. 
cit., ch. xiii. For a more detailed study see Jeans, Dynamical Theory of Gases, 
8rd edition, Cambridge, 1921, chs, ii-iv. 


172 THE STATISTICAL THEORY OF GASES 


with a surface parallel thereto, a radius distance inside, for 
this is the effective limit for the centre of a molecule; when 
we speak of the boundary, we mean this latter surface. The 
laws of collision will then be the following : 

A) When two spheres collide, the motion of their centre of 
gravity is unaltered, and the vector velocity of one centre 
with regard to the other is replaced by its reflection in the 
common tangent plane at the point of impact. 

B) If the centre of a sphere meet a boundary, the vector 
velocity after impact is the reflection of the vector velocity 
before impact, in the tangent plane to the boundary at that 
point. 

It is possible that some molecules will strike exactly into 
cracks between two parts of the surface, which amounts to 
a molecule centre striking a double curve of the boundary, 
but this case will occur an infinitesimal number of times, and 
may be overlooked, 

Since the rotations of the spheres are of no importance, the 
essential point to be borne in mind is that the total vis viva 
of the system will be constant. The phenomena which we call 
‘temperature’ and ‘pressure’ depend upon molecular velocities; 
it is with them that we shall be specially occupied. 

Let the total number of molecules be n. The coordinates 
of the centre of the ith molecule shall be «,;y,;2;, the com- 
ponents of its velocity w,;v;w;. Since the motion is un- 
accelerated, a knowledge at any instant of the 6m quantities 
Ly Yy 2 +++ Uy Yn FnUz V1 Wy «+ UnYyn Wy, Will give a complete account 
of the state of the vas. Moreover, this knowledge will, 
theoretically, serve to predict the exact state at any future 
instant; in other words, the whole history is determined by 
the initial conditions of situation and velocity. . 


§ 2. Representation in Hyperspace. 


There is a great saving in words, in dealing with the gas 
problem, if we use the language, or jargon, of the geometry 
of many dimensions, The reader must not allow himself to 
be unduly alarmed by this proceeding. If an object be 


REPRESENTATION IN HYPERSPACE 173 


determined by N independent variables, we say that each 
of its determinations corresponds to a point in a space of NV 
dimensions. When the variables are connected by one or 
more equations, we say that we have a variety in the original 
space, the number of dimensions of the variety being that of 
the space, less the number of independent equations. A 
variety of N—1 dimensions is called a hypersurface, if the 
equation be linear we call it a hyperplane. When we speak 
of the distance of two points, we mean the expression obtained 
by analogy from the expression for the distance of two points 
in three dimensional space in terms of their rectangular 
cartesian coordinates. The general laws for combining dis- 
tances are the same no matter how many the dimensions. 

We start with a space of 67” dimensions where a point has 
the coordinates 2, Y,2, 1. Un Yn%n Uz VW «+ UnUnWy. Such a 
point will represent the state of a gas of n molecules. As 
the gas changes with time, so does the point move in the 
given space. What is the nature of its path? Since each 
molecule is moving at a uniform velocity along a straight 
path we have 


i Wi = UY. (1) 


The representing point is moving along, at uniform velocity, 
on a path parallel to the flat variety of 3n dimensions 
obtained by giving the last three coordinates fixed values. 
There will be a sharp break in this path corresponding to 
each collision in the gas. To understand these, we must first 
note that in the space of 6” dimensions there are certain 
limiting hypersurfaces which the representing point cannot 
pass. The centre of no molecule can pass the boundary, and 
the distance between two centres can never be less than ac, 
hence 


SF (G,Y;2%,) 29 $ (ay,2,) 20...4=1,2,...0, 
(a2; — a4)? + (Ys —Yp)’ + (2; — 24)" - 0? 20 7,K=1,...0. (2) 


174 THE STATISTICAL THEORY OF GASES 


Moreover, the vis viva of the system has the constant value 

E, hence : 
t=n 
> (uztozZ+w;?Z)—E = 0. (3) 
i=1 

The representing point in 6n dimensions must thus remain 
on the hypersurface (3) moving along a straight line. At the 
moment of a collision in the gas it takes a sharp jump along 
one of the hypersurfaces (2). 

We get a clearer idea by using a slightly simpler repre- 
sentation, namely, taking the space of 37 dimensions, where 
a point has the coordinates @, 7, 2, ... 2, Yn2Zn- The 37 quantities 
U,V, W, -..W U_zW, are the components of the velocity of this 
point, the path will be rectilinear, and the velocity uniform 
until the moment when there is a collision in the gas, If 
a molecule collide with the boundary f=0 we have the 
following relation between the components of velocity before 


and after: 
P) 
fe ‘Ta, F uch eu, clin 
Uu; = +U;. (4) 


OS 


There will be similar equations for v,’ and w,’. On the 
other hand, if the molecules x, y;2; and 4,2, collide, we have 


=a tle y,=yjtmo 24%=2%+N0, 
b?4m?+n? = 1, 
Uy + Up = Uj + Uy; Uy $Y; = Vj, +45 W; +O, = Wi +W,, 


U; 


jj — Uy, = — 21 [L(u; — Uy) + m (v; —U;,) +2 (W; — Wy) ] +; — UR, 


u,’ = —L[L (uj; —uy,) +m (v, Rec ;— Wz) | +4,;, 
Uy, = l [d (uw, — Uy) +m (v; —V;) +7 (w, — Wy) | + Uz,» (5) 
These equations have a simple meaning. Let us suppose 
that the laws of elastic bodies are the same in the space of 3n 
dimensions as they are in our space, i.e. when an elastic 


point encounters a hypersurface, it bounds off with a vector 
velocity which is the reflection in the tangent hyperplane of 


REPRESENTATION IN HYPERSPACE 175 


the previous vector velocity. The interpretation of equations 
(4) and (5) is then as follows: * 


Theorem 1] Jf n gas molecules be represented by a single 
point in a space of n dimensions, the alterations of 
the gas are perfectly represented by the motion of this 
point as wt traces a straight line with the wniform 
velocity given by (1) or rebounds elastically from one 
of the hypersurfuces given by (2), the square of its 
velocity being constantly given by (3). 

There is another conclusion of a more complicated nature 
which can be drawn from equations (4) and (5). We have 


, , , / fe , 
al * E (U4, OF Wy Up Vp Mp ‘ly a 
SB — — = 9 
Pa) (U,V; U,) P) (u; Vj; WiUyY Wy) 
so that 


OG Ya ey cs LOY Fy UU Wy o> Up Un Wn) 
Tf, then, we express our volume element in 6” dimensions 
as we do in 3 dimensions, we see that the- transformations 
(4) and (5) are of a sort to preserve volumes. Suppose that 
the probability that a point in this space should lie close to 
an assigned position be proportional to a quantity differing 
by an infinitesimal of higher order from 
dx, dy, dz, ... dx,dy,dz, ... du, dv, dw, ...du,dv,dw,, 
the element of volume, then after any time there is an equal 
probability that it will lie equally near to the transformed 
position. This leads to: 


Assumption 1] The probability that a gas should have properties 
analytically expressible in terms of the 6n coordinates 
is proportional to the volwme of that portion of the 
space of 6n dimensions in which the representing point 
must then lie. 

Our work in geometrical probability shows how arbitrary 
this assumption is, for it amounts to assuming that 


Ly Y, 2) 0. Un Uy Wy 


* Cf. Borel, ‘Sur les principes de la théorie cinétique du gaz’, Annales de 
U' Ecole Normale, Series 2, vol. xxvi, 1904, p. 24. 


176 THE STATISTICAL THEORY OF GASES 


are the natural independent variables, and that the element 
of volume is given by the differential expression. above. 
From this assumption, and from the proof that the large 
Jacobian written above is equal to 1, we deduce: 


Theorem 2] Jf a set of gas molecules be such that at a time t, 
there is a certain probability that wt will possess 
a property of the type mentioned in Assumption 1] 
then there is an equal probability that at the time t, it 
will possess the transformed property. 


It is scarcely necessary to warn the reader that the word 
‘probability’ as used in this chapter must be understood in 
the statistical sense that we defined in Ch. I and have con- 
sistently used throughout the present work. 


§ 3. First Deduction of Maxwell’s Law. 


Our Theorem 2] tells us that the probability that a gas 
should have certain property is proportional to the volume 
of a certain region in the space of 6 dimensions. If, how- 
ever, this property have to do merely with the vector velocities 
of the particles, as the only limitation on these is the equation 
(3), we may confine ourselves to a representation in a space of 
37 dimensions only, where the coordinates are 


Uy UW, 10. Ud W,. 

Now, by the form of Assumption 1) the probability that 
a certain gas should have a certain property is proportional 
to the volume of a certain region in this space of 3 n dimensions, 


not proportional to the hyper-area cut by the region from 
the hyper-sphere (3). The element of volume is 


du, dv, dw, ... du,dv,dw, = (du,dv,dw,) ... (du,dv,,dw,), 


and is the product of » different volume elements in the 
three-dimensional space wu, v, w. Hence the probability 
sought is proportional to the product of » different volumes 
in three-dimensional space. These » points we shall call 
‘velocity points’. 

Suppose next that the three-dimensional velocity space 
u, v, w is divided into a very large number v of cells of equal 


FIRST DEDUCTION OF MAXWELL’S LAW WA rs 


volume V. When we say a large number, we mean that, as 
a first approximation, the coordinates of all points in one cell 
may be taken as identical. We shall assume, however, that 
n is so large that it is well above v. The probability that the 
first a, points shall be in the first cell, the second a, in 
the second, the last a, in the last is proportional to 
VOY Ae5s Be 

and since we have 

A, +A,t+...+0 
this is ve 

On the other hand, the probability that some a, lie in the 

first cell, some a, in the second, some a, in the last is pro- 
portional tv this number multiplied by the number of ways 
in which we can divide » objects into distinguishable groups 
of a@,,d,... a, respectively. This is given by Ch. II (3), so 
that our probability is proportional to 

‘pics oy eee (7) 

Gal tt 


yee (6) 


Our fundamental question is to find a set of values 
U,V, Wy 0. Uy Un Wy 
subject to (3), which will make this a maximum. 
Since P isa maximum with its logarithm, we must maximize 
log (1!) —log (a, !)—log (a, !) —... —log (ay!). 
By Stirling’s formula 
log (r!) = (r+4) logr—r+4 log 27. 
Hence, we must minimize 


> (a; +4) log a;. 


vel 


Assuming all of the a;'s are large, it will suffice to minimize 


i-zv Le 
— a 
2; aan, De a; (U2 +02 +?) = KB. 
ao c=1 
3686 N 


178 THE STATISTICAL THEORY OF GASES 


Strictly speaking, the quantities a; are integers, but we 
may obtain a sufficiently accurate answer hy treating them 
as if they were capable of continuous variation, and looking 
for a relative minimum. We have thus 

>) t=¥ 
ele a, log a,—r(u,? + +w,?)—s}+rF + sn |= 0, 
t=1 
loga, = 7 (uj? +072 +w,7)-1+s. 

In view of (3) we must expect a; to decrease as u?+v?+w? 

increases, so that we write 


—h2 2442 2 
a = kye h, (u/ +0 +O, ). 


In words, this tells us that the most likely distribution of 
velocities is that where the number close to the values wu, v, w 
is proportional to 

nke~ + +2") Oy dy dav. 


To find the values of the constants, we have, first, 
Wiles nk | fF ee u2 du es reds: dy ee —h2 w? dw, 


b= (3) - 


This process is very much the sort of ‘near mathematics’ 
to which we have frequently resorted, for we have integrated 
between infinite limits, whereas in view of (3) every velocity 
component must be less than /E. However, since the value 
isan es ia very small for every large w, the error is not 
serious. To find h, we must remember that the number of 
molecules in a cell is about n times the probability that 
a ee should be in Bee cell, hence 


a eae [ (ut tote wtye™ (wiht) Ty dv dw 


hry? Me 22 Be 25.2 
— goat ) le we" ™ dw| oe dv| ee ite 
V1 =—& eo 


Ps 37 
eet 


37 
b= fy 


FIRST DEDUCTION OF MAXWELL’S LAW 179 


Theorem 3] The most probable distribution of velocities among 
the molecules of « gas is that where the number of those 
having velocities within the limits uwtidu, v+4dv, 
w+idw is nearly equal to 


3 — 


h h? (u? + v? + w?) 
n Ges ) e du dv dw, 


Here n is the number of molecules, E the given vis 
viva, the mass of each being taken as 2. 


This is Maxwell’s law for the distribution of velocities ; 
a gas in this condition is said to be in a normal state. The 
equation is so well known historically that we reproduce his 
original proof.* 

‘Let V be the whole number of particles. Let wu, v,,w be 
the components of the velocity in the three rectangular 
directions, and let the number of particles for which w lies 
between wu and w+du be Nf(u) du, where f is a function to 
be determined. 

The number of particles fur which v lies between v and 
v+dv will be Nf(v) dv, and the number for which w’ lies 
between w and w+dw will be Nf(w).dw, where f always 
stands for the same function. 

Now the existence of the velocity w does not, in any way, 
affect the existence of the velocities v or w, since they are at 
right angles to each other and independent, so that the 
number of particles whose velocities lie between w and wu +dwu 
and between v and v+dv, and also between w and w+dw, is 


Nf (u) f (v) f (w) du dv dw. 


If we imagine NV particles to start from the origin at the 
same jnstant, then this will be the number in the unit of 
volume du dy dw after the unit of time, and the number per 


unit volume will be = Wf (u) f(v) f (w). 


But the directions of the coordinates are perfectly arbitrary, 


* Maxwell, Collected Works, vol. i, pp. 380 ff. Also Jeans, loc. cit., p. 55. 
N2 


180 THE STATISTICAL THEORY OF GASES 


and therefore this number must depend upon the distance 
from the origin alone; that is 


SWYFOY)F (w) = (UP +y? + w?). 


Solving this functional equation, we find 
tT (u) = Ce 4” d (u? + v? + w?) — CO3e-4 (uP +0? + wi) 


This simple proof is, unfortunately, very unsound, as it is 
not at all clear that the components of the velocities may be 
treated as independent variables, and in fact, if one component 
were equal to the square root of the vis viva, all others would 
have to be equal to 0.* 


7 $ 4. Amplification of the Preceding Proof. 


There aro certain points in the deduction of the Normal 
Law of Maxwell which deserve more careful mathematical 
investigation: it is now time to return to them. Logically, 
we should have cleaned up everything as we went along, but 
this would have burdened the argument to an unbearable 
extent. 

The first point to notice is the fundamental role played by 
Stirling’s formula. Now that formula, as stated in Ch. III, 
tells us that, 


i = <1+ a 
ren" (2 mr)? 107’ 


hence the error made in evaluating 
log a,!+loga,!+...+loga,! 
is of the same order of magnitude as 
e/a 1 1 
alee + ae: 
or as v/s 


Both of these quantities are large; it is not clear what the 
nature of their ratio will be. 
* Cf. Bertrand, loc, cit., p. 30. 


AMPLIFICATION OF THE PRECEDING PROOF 181 


To begin with, it should be noticed that there is no harm 
done if, in deducing our general law, we reject a small 
number of molecules. In fact we may reject a number in- 
creasing with », provided that it does not increase as fast 
as n. v depends upon the number of cells; it will remain 
constant if the total volume in the wu, v, w space remain fixed. 
But we have no right to assume that this total volume does 
not increase with n, for it is limited only by (3) and the vis 
viva clearly increases with the number of molecules. We 
shall not go far wrong in assuming that H increases pro- 
portionately with ». In order to keep the volume below 
a certain upper limit, we may have to reject a certain number 
of molecules of the highest velo¢ity. But if we put # = pn, 
we see that the number of molecules, the squares of whose 
velocities are greater than p, cannot increase proportionately 
with n; hence the number rejected will increase less rapidly 
than n, and do no harm. We may choose vy once for all, and 
then assume that all of the a’s are so large that v/a, is small. 

The following difficulty is more serious. If we take 
different velocity points in the same cell of u, v, w space, 
their coordinates are not identical. We have the more exact 
equation 

k=vj=qQ 


DD [uy +8; up)? + + 4)1)? + (W;, +8; 1)" -F = 0. 


b=1j=1 

We may imagine that every increment 0; u;,, 3;v,, 6;w, lies 
within the small limits + 4 , where @ is very small, thanks to 
the size of v. 

The point in the space of 3n dimensions with coordinates 
(u, + 8,%4) (%, +.4,%) (Uy +4, W) » 

(Uy + 4, Uy) (Vy + 6, %) (ty +6, Wy) + oes 
which represents a distribution of velocities in v cells, will lie 
in a 3n dimensional hypercube of edge 6. The quantity 
which we call # is the square of the distance of this point 
from the origin and the total variation of this cannot exceed 
the length of a diagonal of the small hypercube 6/37, 


182 THE STATISTICAL THEORY OF GASES 


The difference of the distances of two points from a third 
point is less than their distance from one another ; hence 
kK=y 
pe Ay, (Wz? +U,2 + W;2) — VK S OV 3n, 


foal} 


{Rap 
D &, (24+ U2 +2) —(V E+ yb 3n)? = 0, 
[Gs 1 

-1SySl. 


We find, as before, 


a = ies (u? +0/? + w/?) 
aN 
Baer k 
3 
(VE+y0V3n) = : 


Al y? 6 


Since 6 is very small, a is a good value for h. 


{ § 5. Probability of a Nearly Normal State. 


We have seen that the most probable distribution of velocities 
is that given by Maxwell’s law for the normal state, given in 
Theorem 3]. This knowledge does not, however, carry us 
very far, as we have not much idea how likely it is that the 
distribution of velocities in a given gas will be according to 
this law, or nearly so. This difficult problem must now 
claim our attention. 

Let us begin by shifting slightly the number of molecules 
in each cell so that the number in the 7th cell is now 


fife 
a= A;+ a, 


v= =v 


a= 0, D> a, (uP +v2+w)’= 0. (8) 
l=1 l=1 


PROBABILITY OF A NEARLY NORMAL STATE 183 


We replace (7) by 
nm! Vn 


7 7 Tay 
{ ! 
Q 1hs bs. a, | 


Ps 


log P’ = (n+ 4) logn+nlog V— (>) log 2a 


Pe (Se 
— > (q+, +4) log (aj + a) 
Ce] 
t=v 
= log P — 2 [ erlog ay + (a +a + 4) log(1 + ¢ ce = | 
é= 1 


l=v 


= log P — > | arlog “|, 
l=1 2a} 


nearly. 
Since ape ke ete) 
by (8), 
=v l=v 
> a (loga,+1) = > a [logk+1—h? (ue +v? ee = 0, 
=1 ba 


log P’ = log P— > ———9 


P’ = Pee. 


We wish to find the sum of the values P’ for all integral 
sets of values a,...a@, compatible with (8) and with a con- 
dition of size, let us say 


t=” 
Dyn, oe 
Ja eS | (10) 
We seek the value of 
T= Dae aS 


for all integral values of a,...@, compatible with (8) and (10). 
It is clear that we must, first of all, replace our summation 


184 THE STATISTICAL THEORY OF GASES 


by some sort of definite integral. Let the number of groups 
of a’s be N, the corresponding values of P’ being A’, Fy, ... Fy’. 

i=N 

T= Dye te : 

i=1 
Consider the space of v dimensions where a point has 
coordinates a,, a,,...a,. Equations (8) give us two hyper- 
planes in this, and (10) a hyperellipsoid. The section is 
a hyperellipsoid in a space of y—2 dimensions whose volume 
we call W. Let each point within this hyperellipsoid with 
integral coordinates be enclosed in a separate region of volume 


dv; then Nt 
BY =| Rae, 


where the integral is extended throughout the whole of the 
region including the point. As NW is very large indeed, and 
the region small, this is close to the value obtained by re- 
placing P’ by the continuous variable P’. Then 
po ay 
L = wl? dv, 


approximately, and so 


where this integral is taken over the whole of the hyper- 
ellipsoid. We may re-write this 


d2 

NP -1 y+ 

wap | |é ic da,da,...da,. 
1 U ark 


We next put 3 py 


By altering z we have a set of concentric ellipsoids. If the 
volume of one of these be written f(z), then 


£9 
2 


NP 2 
si * capt (2) dz. 


1G)" 


f @dz=f(2); 


0 


T= 


PROBABILITY OF A NEARLY NORMAL STATE 185 


A change in ¢ is in the nature of a central similitude in 
the space of y dimensions, which carries each of the hyper- 
planes (8) into itself, and permutes the hyperellipsoids in this 
space. Hence it permutes also the hyperellipsoids in the 
space of y—2 dimensions, and we shall have a relation 

y=2 


« f@—=Ke?, 


as we prove by beginning with the case where all of the a’s 
are equal, and then imposing a homogeneous strain. 


O="Q) 
= a BAP 


23) | 


Now if z, be allowed to increase indefinitely we shall 
eventually include all sets of integral values of a@,, a,... dy 
compatible with (8) and our probability 7 becomes a certainty. 
Moreover, the integrand becomes tiny when z is very large, 
so we assume that we get certainty by integrating out to 


infinity. Lastly, cals is proportional to the number of 
Zyne- 
(3) 
integral points divided by the volume of the hyperellipsoid 
and varies little from a fixed constant ~. Hence 


_ Se (11) 
Ho, 


186 THE STATISTICAL THEORY OF GASES 


The only quantity here which depends on 1 is z,, for we 
have already seen that v may be chosen once for all. If 
m increase indefinitely, we may expect a, to increase about in 
proportion. If we allow a, to increase in about the same 
ratio, then z, will increase about proportionately with n 
and 7 will approach 1 as a limit. 

We can express this more accurately. The quantities a, 
are those which give the most likely distribution, the a)’s 


: : . a ° 
are discrepancies, and the ratios —! 8, are the relative 
ay 


discrepancies. We have the inequality 


l=v 


> 2 aF a1. (12) 


e100 

If we start with the 8)’s, then from (12) we may assume the 

aj’s to increase proportionately with z,. 

Theorem 4] The probability that the distribution of velocities 
shall differ from the normal one in such a way that 
each relative discrepancy is less than some assigned 
quantity, will approach 1 as a limit of the number of 
molecules increase indefinitely and the vis viva increase 
proportionately with them. 

Theorem 5] If the number of molecules be very lurge, and if 
a gas be taken at random, it is practically certain that 
the distribution of velocities will differ but little from 
the normal one. 

Theorem 6] If a large number of gus specimens be examined, 
each containing the same number of molecules of the 
sume size and mass, with the same vis viva and equal 
containers, in the vast majority of cases the distribution 
of velocities will differ but little from the normal one. 

We have stated this theorem in three different ways, in 
order to contrast them with another statement which seems 
less legitimate : 

‘This completes our information about-the motion of the 
gas. At any instant it is infinitely probable that it is in 
the normal state. In the course of the motion departures 
from the normal state will occur, but it is infinitely probable 


PROBABILITY OF A NEARLY NORMAL STATE 187 


that these will occupy but an infinitesimal fraction of the 
time occupied by the motion.’ * 

This conclusion seems unwarranted. Returning to the 
representation by means of points in higher space each repre- 
senting point will trace a trajectory made up of rectilinear 
segments, followed by jumps along the hypersurfaces (2). If 
a large number of representing points be started on their 
journeys at the same moment, a large majority of them will 
always be found in regions corresponding to normal -distri- 
butions of velocity. But it does not seem at all clear that 
the paths are such that a minority of points may not stay 
most of the time in regions corresponding to abnormal distri- 
butions of velocity.| Perhaps the best statement we can 
make is the following : f 


Theorem 7] If a ‘gas specimen be chosen at random from 
a very large number, all with equal containers and 
equal vires vivae, it 1s immensely probable that it will 
have a nearly normal distribution of velocities most of 
the time. 


§ 6. Distribution in Space. 


The work which we have done so far has been exclusively 
on the distribution of velocities ; the question arises naturally 
whether we may not carry on a similar discussion of the 
distribution of the molecules in space. 

We begin by replacing our coordinates wu, v, w by a, y, Z. 
These must all be finite since the gas is supposed to be in 
a finite container. This container we may imagine divided 
into a number of equal cells as before. At this point the 
analogy breaks down. In the case of velocities, starting 
with the assumption that the probability that a gas was in 
a certain state was proportional to the volume of a certain 
region in the space of 67% dimensions, we noted that the 
limiting conditions (2) do not involve the velocity coordinates. 

* Jeans, loc. cit., p. 55. 

+ This possibility is hinted at ibid., following paragraph. 

t The discussion of these points in Castelnuovo, loc. cit., pp. 290 ff., is 
admirable, 


188 THE STATISTICAL THEORY OF GASES 


Hence the probability was proportional to the volume of 
a region in a 37 dimensional space and that was proportional 
to the product of the volumes of m regions in the three- 
dimensional space wu, v, w. But a similar line of reasoning 
is not applicable to the a, y, z coordinates, for they appear in 

(2). The matter is even more evident on purely physical 

grounds. The velocity coordinates of non-colliding molecules 

are totally independent, and are in no danger of ‘crowding’ 
one another, but. the fact that a certain molecule lies in 

a certain small region reduces the probability that a second 

molecule should be therein. This shows the illegitimacy of 

the ‘assumption of molecular chaos’ which is used to deduce 

Maxwell’s I.aw from dynamical considerations. This assump- 

tion may be stated as follows : * 

‘It is usual to assume that the molecules having velocity 
components within any specified limits are, at every instant 
throughout the motion of the gas, distributed at random, 
independent of the positions or velocities of the other mole- 
cules, provided, only, that two molecules do not occupy the 
_ same space.’ 

The matter assumes quite a different aspect when we 
assume that the diameters of the molecules are negligible. 
Here we retain only those inequalities (2) which have to do 
with the container, and these involve the positions of the 
molecules separately. Equation (3) drops away. We may 
repeat our previous reasoning: where the quantity called 7; is 
equal to 0, the a,’s will all be equal. 

Theorem 8] When the radw of the molecules are negligible, 
the most likely distribution is a wniform one through- 
out the container. 

The reasoning previously employed to find F’ is still valid : 
4] Theorem 9| When the radii of the molecules are negligible, 

there is a very great probability that at any instant 
the distribution will be nearly wniform throughout the 
container. 

q| Theorem 10] When the radiz of the molecules are negligible, 
the assunuption of molecula» chaos is legitimate. 


* Jeans, loc. cit., p. 17. 


« CHAPTER XI 
THE PRINCIPLES OF LIFE INSURANCE * 


§ 1. Calculation of Life Probabilities. 


THE fundamental question on which the whole theory of 
life insurance is based is the probability that a certain in- 
dividual shall survive a certain time. From one point of 
view the statement of this question is nonsense on its face: 
our times are in the hands of God; the probability does not 
exist. But let us remember that from the very start we have 
clung closely to a statistical definition of probability ; that 
definition will stand us in good stead now. The problem 
means essentially this. An individual is classed as a member 
of a recognized category of a sort that has been long under 
observation. What proportion of that category may we, as 
a result of statistical inquiry, expect to survive the time in 
question ? 

The category in which a healthy individual is usually classed, 
for purposes of life insurance, is that of his age. The funda- 
mental problem can be put in the following more exact form: 

‘What is the probability that a healthy individual of age x 
will survive one year?’ The probability that he will survive 
two years is the product of the probability that he will 
survive one year, multiplied by the probability that a person 
one year older will also survive one year, and so on for 
a number of years. The question of how to compute these 
probabilities statistically is ever so much harder than one 
would suppose at first. One would be inclined to say: ‘ Why, 


* The masterwork on the subject of the present chapter is the Institute of 
Actuaries Text-book, Part II, by Gcorge King, 2nd edition, London, 1902. See 
also Czuber, loc. cit., vol. ii. 


190 THE PRINCIPLES OF LIFE INSURANCE 


all you have to do is to take a census of a large number 
of persons of age « at a certain date, check them up a year 
later to see how many are alive, and form the quotient.’ 
Unfortunately, this is quite impracticable. To begin with, 
the category is too elastic. If at a certain date two persons 
give their ages as 2, one may be 364 days older than the 
other, a serious divergence in the later ages. In fact it might 
be wiser to assign the age z—1 to the one, or «+1 to the 
other. Moreover, unless all of the individuals were soldiers 
or convicts or of some other non-representative sort, it is 
impossible to keep track of them all. Some will escape 
observation during the course of the year, and it will be 
impossible to say at the end of that time whether they are 
alive or dead. 

Difficulties of a somewhat different sort arise when we try 
to compute the probability of surviving from birth and death 
statistics. Ifa man be born in the year 1900 and die in the 
year 1925, he may die at the age of 24 or 25. If in the year 
1925 a man give his age as 25 years, he may have been born 
in the year 1899 or the year 1900. If a man born in 1900 
die at the age of 25 years, he may die in 1925 or 1926. 

Further complications arise for an insurance company 
which tries to calculate life probabilities from its own ex- 
perience. Suppose that, at the beginning of a certain year, 
the number of persons insured of a given age is known. 
A year later the books of the company are re-examined and 
the number of persons, ostensibly a year older, is observed. 
The ratio will by no means give the probability for surviving 
one year; the two sets of figures bear on different, if over- 
lapping, categories. Among those who appear in the second 
count are some who did not appear the year before, because 
they took out their first policies during that year. On the 
other hand, of those whose names appear the first year, some 
will disappear, and the company will not know whether they 
survive or not. It is still worse if the company make use 
of the lists of deaths, for those who die during the year at 
the same age will have been born in two different years, 
and will appear under different years in the birth registers, 


CALCULATION OF LIFE PROBABILITIES 191 


and some will have taken out their first insurance in the 
course of the year. 

It is evident that, in view ofall these difficulties, no perfect 
calculation is possible ; the best we can do is to adopt certain 
arbitrary, if plausible, conventions. To begin with, different 
insurance companies combine their experience. Some of the 
best life tables are those of twenty British companies, and 
the large American companies combine their experience also. 
Secondly, statistics are made up as of January 1, but each 
individual is given a fictitious birthday, the Ist of July 
nearest to the date of his actual birth. This corresponds to 
the tolerably reasonable assumption that those who announce 
a certain age on January 1 have their birthdays scattered 
pretty evenly over a twelvemonth. In the same way, it is 
assumed that all who take out or surrender their policies 
during a year,do so on July 1. Let ZL, be the number of 
persons aged x whose names appear on the company’s books 
on the 1st of January,* the number giving their age as 7+1 
a year later shall be Z,,,. Let py be the chance that a person 
aged x will survive one year. Let 2, be the number of new 
policy-holders aged «x who enter during the year, e, the 
number who left. The probability of surviving half a year 


will be about 1-4 (1—p,) = 4 (1+ p,). 


Now L,,, is made up of the survivors of Z, plus the 
surviving immigrants, and less the surviving emigrants, i.e. 


é€ 4. 
Ley, = Lepy — mae + Dy) + a + Pz), 


Lig 41 + (Cx — te) 
L,—4 (Cx — Vx) 


(1) 


Ps = 


It must not be imagined that after the various p,’s have 
been calculated in this way the results are in final shape. It 
is clear that no one will be perfectly accurate, not even the 
best value obtainable. In fact if we plot each p, as an 
ordinate corresponding to the abscissa a, we have points of 


* The notation used throughout this chapter is the universal one adopted 
at the second International Actuarial Congress, London, 1898. 


192 THE PRINCIPLES OF LIFE INSURANCE 


a broken line that waves up and down. The next step is 
to ‘graduate’ these results, and consists essentially in altering 
the ordinates slightly so that the resulting points lie on 
a smooth curve, which sinks continuously after the years of 
early childhood. This graduation may be accomplished in 
a large number of ways. We may replace the middle one 
of each triad of successive points by the centre of gravity of 
the triangle with these points as vertices. This ‘will bring 
down the mighty from their seat, and exalt the lowly and 
meek’. If need be, the process may be repeated several 
times. Another plan is to take a number of points greater 
than three and find, by least squares, the parabola of vertical 
axis which lies nearest to them. A very simple way is to 
plot the points and then run a smooth curve as near them as 
possible with the aid of a spline or some other instrument. 
Practical actuaries seem to find this method as good as 
any other.* 

An ideal way to calculate the probability of survival would 
be to find an explicit function for pz. Various attempts have 
been made to find some such function, the most successful 
being that of the English actuary Makeham, whose method 
we shall now explain.f 

Let /, be the number of persons, all born at practically the 
same time, who reach the age 2. The probability of surviving 
one year is los 
Px = a oe i (2) 

C 

Let —Al, be the number of persons of the category x 
who die in a short space of time Aw thereafter. Then the 
instantaneous death-rate, called the ‘instantaneous force of 
mortality ’, is 

Sim =A ~ aloe, 
Meno Ag da 8) 


According to Makeham’s assumption, death will arise from 
one of two general causes. The first is accident, and may be 


* Cf. Czuber, loc. cit., vol. ii, pp. 167-200. 
+ Makeham, Journal of the (British) Institute of Actuaries, London, Jan. 1860. 
Unfortunately, I have not been able to verify this reference, 


CALCULATION OF LIFE PROBABILITIES 193 


looked at as a constant throughout, for younger men are 
more active than old ones, and have greater recuperative 
power, but also take more risks. The second is decrease in 
power to resist disease. If we overlook the accidental deaths 
for a moment, we might fairly assume that the rate at which 
people were dying at any instant was inversely proportional 
to a function f(x) which represents the force of resistance to 
disease. Hence 


B 
yy = A = ° 
B + Fa) 


With regard to f(x), Makeham assumes that in any short 
interval a man loses a constant proportion of such force of 
resistance as he still has. 

We now have the data necessary to calculate the number 
of living /,; we change constants at pleasure throughout our 
integration 


ao = — pda, 

Jia) = Vek", 
Be = A+ Be*, 

log l, =—Ax—De*—F, 
keg. (4) 
Pex = sD, (5) 


A simpler formula was devised some time earlier by 
Gompertz.* He overlooked the element of chance, and there- 
fore made A = 0 and s = 1, thus getting 


Lg = kg, py = gO. (5) 
We may find the values of the constants in Makeham’s 
formula from four observations as follows: 
log l, = logk+alogs +c* log g, 
log ly 4, = logk+ (a +t) logs +e!c*® log y, 
log ln454 = logk+ («+ 2t) logs+c*'c* log g, 
log ln43¢ = log k + (a+ 3t) logs +c c* log g. 


* Gompertz, ‘On the Nature of the Function expressive of the Law of 
Human Mortality’, Philosophical Transactions, Royal Society, 1826. 
2686 oO 


194 THE PRINCIPLES OF LIFE INSURANCE 


The first differences are 
A log l, = tlog s + (c'— 1) c* log'g, 
A log lx4, = tlogs+et (c'—1) c* log g, 

A log lz454 = tlog s+c** (e'—1) c* log g. 

The second differences are 
A, logl, = (ct—1)* e” log g, 
Aloe lee, = Cle = 1) e" los o: 

log (A, log J, 4,) —log (A, log /,) = t loge. 

The other constants are then easily found. 


Another, and better, plan is to use all available data and 
determine the constants by least squares. We write 


log py = logs +c* (c—1) log g. 


The constants, of which logs and (c—1)logg appear 
linearly, may be found by the methods explained in Ch. IX.* 

How much confidence should we place in Makeham’s 
formula? It is evident that the assumptions on which it is 
based are nothing more than reasonably plausible. The test 
is whether it really checks up in practice. This is the case 
to a really surprising extent. A life table calculated by 
Makeham’s formula is better than any but the very best 
table calculated by other means. This is strikingly brought 
out by the following figures,t where, unfortunately, we have 
available not Makeham’s formula but the less accurate one 
of Gompertz. The values tabulated are for /,: 


Gompertz 20 British 


Be aicenala. Bate Duvillard. Deparcieux. Northampton. 
30 890 890 890 890 890 
40 839 813 750 797 737 
50 745 718 603 704 579 
60 584 584 434 561 413 
70 355 382 238 375 250 
80 125 142 71 148 95 


If we assume that the best tables are those of 20 British 
Companies, we see that this table shows that the Gompertz 
figures are as accurate as Deparcieux, and distinctly better 


* Czuber, loc. cit., vol. ii, pp. 181 ff. 
+ Bertrand, loc. cit., p. 818. 


CALCULATION OF LIFE PROBABILITIES 195 


than Duvillard or Northampton. As a matter of fact, 
Makeham tables are used in practice. As for the Gompertz 
formula, that is useful for calculating the probabilities for 
contingencies depending on two lives. We see from (6) that 
the probability that a person aged (a#) and another aged «’ 
should both survive is p,, where 


igshe BY ie oP 


This formula would not hold for two persons intimately 
connected, like husband and wife, for the death of one would 
be likely to hasten the death of the other. 


.§ 2. Endowments and Annuities. 


Before taking up the subject-matter of the present section, 
we must say a word or two about the mathematics of finance. 
In calculating all sorts of insurance values, it must be under- 
stood that the word ‘interest’ always means ‘compound 
interest’. If, thus,7 be the rate (usually in the neighbour- 
hood of 34 per cent.), the amount of $1 at the end of n years is 

(1+2)". 

To find the present worth of $1 payable at the end of that 
time, we write (1+7)"! = v; the present worth, or discounted 
value, is v™. 

With regard to calculating compound interest, we note that 
if the interest be compounded m times per year, the amount 
at the end of 7 years will be 

(1 ml v ers 
m 

Allowing ™ to increase indefinitely, the amount at interest 
continuously compounded is e”; if we put e” = (1+7) we see 
that compound interest or present worth can be reckoned by 
assuming the new rate 7’ with continuous compounding. 
This new rate is called the ‘force of interest’. 

A sum of money to be paid at the end of a certain time, 
provided that a stated individual is still alive, is called an 
‘endowment’. What is the present value of $1 payable at 

02 


196 THE PRINCIPLES OF LIFE INSURANCE 


the end of 7 years, in case a person now aged « is then alive ? 


l iy earn 
SS yee etn, 
lee ie vl, 
Every practising actuary is provided with a series of tables 
called ‘commutation tables’ which contain the fundamental 
data needed for his purposes.* The first column in such 


tables contains the age «, the second 


Dp = vl, (7) 
so that the fundamentai endowment formula is 
ne = Dein/Dz- (8) 


A sum of money which shall be pald at the end of each 
year that an individual survives is called an ‘ Annuity’; the 
value of an annuity of $1 based on the life of an individual 
aged a is ne Dy t+ Dargo toe, 

Dy 

Sometimes it is required that the first payment shall be 
made immediately. In that case the sum is called an 
‘ Annuity due’; the Germans have more sonorous titles, calling 
them annuities ‘postnumerando’ and ‘praenumerando’ re- 
spectively. In the third column of the commutation tables 
are the quantities 


N,, = >? Dyes (9) 
i=1 
where is the age, usually about 100, where an individual 
may reasonably be supposed to be certainly dead. We have, 
then, for an annuity 


fa = Nxf/Dz, (10) 
and for an annuity due f+ . 
1+, = N22, / Ds (11) 


4] It is interesting to compare the value of a, with that 


* Cf, King, loc. cit., pp. 512-45, 
+ This is the standard notation, Some authors, as Czuber, loc. cit., write 
i=w-2 
N, = > Dz,; and a, where we write 1+a,. Thus they write (10), meaning 
i=0 


(11), 


ENDOWMENTS AND ANNUITIES 197 


of a certain payment to be made yearly during the season 
é,, the expected life of the individual, i.e. the mean life 
of one of his age, 

bon 44 tleig +... +o 


ee (12) 
x 


The value of the certain payment is 


Cx 
vtur+ue+...40% = |” =| 
V— 


1 Recs 
= 51-0+9-4} 
We wish to compare this with 


Une) +7 biyot ae 2s 
le 


ad, = 


eS pitt 4 pres ay pats tee sep esate aa 
l, be ly ty 


To prove that this is less than the other, we must show 


ol ae v(a- St) ey v(4 ne mH) 4. 
ae or te 
+0%(1 ss ay 
ly 


The right-hand side is greater than 


a (Ce ea Cao es ae) 


=3 = eet thee 
= , ] 
© 


a yeatl (eee S| ’ 
= eren 


Problem. 
Find the value of |,a, an annuity limited to n payments, of m|a, an 
annuity whose first payment will be at the end of m years, and of mina, 
which is both limited and postponed. 


198 THE PRINCIPLES OF LIFE INSURANCE 


The inequality is thus established, and the certain pay- 
ment has the greater value. 

It is sometimes important to know the mole of an annuity 
which increases each year. Let the first payment be $1, the 
second $2, the third $3, and so on. We have here, really, an 
annuity for $1, another for the same sum deferred one year, 
a third deferred two years, and so on. The value is thus 


se Beaten -+Drygt.. 
Dysg+Daigt 
De L Dee 


1 py 
= p Nat Nest Nera t 
x 


The fourth column in the commutation table is 


t=w-—2Z 


So= > Nee (13) 


t=0 
The value of our complicated annuity is then 


S,/Da- (14) 


§ 3. Single Payment Insurance. 


There are two sorts of benefits which a Life Insurance 
Company is called upon to pay: annuities if people survive, 
insurance if they die. We have calculated the values of the 
principle types of the former benefit; we must now calculate 
the latter. In passing, we note that whereas when a man 
wishes to take out an insurance policy he must pass a careful 
physical examination, and give evidence that he does not 
follow an unusually Trae calling, when it comes to 
annuities, the worse a man’s health and the more dangerous 
his calling, the better the Company will be pleased. 

What is the present value of $1 to be paid at the end of 
the year in which a person aged @ dics? We call this A, 
and note that we have various mutually exclusive possibilities 
that he may die in any one of the succeeding years. 


SINGLE PAYMENT INSURANCE 199 


l Lis l 
A Se _ e+ 9g} ‘et _ “ete 
2= ft Palkia (1 ole 
: late less ; 
+0'| bes (1-Z**) |+.. 

= u(1+a,)—dy 

vN,-,—Nx 
= gy oe 


This formula might have been predicted by the following 
reasoning. The Company agrees that if the man be alive at 
the beginning of any year it will pay $1 either to him or to 
his ‘heirs or assigns’ at the end of the year. The man 
agrees that if he be alive at the end of the year, he will pay 
that dollar back. The man agrees to pay the Company an 
annuity of $1, the Company agrees to pay an annuity due 
for the same amount, but as each payment is postponed a year, 
the whole must be discounted once. The difference between 
these two benefits gives the formula above. It must be 
added, however, that this is not the best type of formula for 
computation. We shall add to our commutation table columns 
based on those who die, not on those who survive. The 
number who die at the age ~ is 


chy = Up — ling 1: (15) 
A= pie + vids +o. 
iM is 
_ vd, FU dna tee, 
ge vl. 
Let wd, = Cy, (16) 
t=wo-2£ 
M, = > Cres ‘ (17) 
t=0 
A, = M,/Dz. (18) 


Let us, lastly, suppose that the amount of the insurance 
will be $1 if the man die in the first year, $2 if in the 
second, and so on. We have 


1 
ie [M+ My +...]. 


200 THE PRINCIPLES OF LIFE INSURANCE 


‘We make a last column in our commutation tables, 


t=w-2£ 
Re = Be M43 (19) 
t=0 
the valuc of this increasing insurance is then 
R,/Dz. (20) 


A much more frequent form of contract is the so-called 
‘endowment’ policy, where the Company agrees to insure 
a life for n years, and to pay the amount at the end of that 
time in case the person is still alive. This is clearly the sum 
of an insurance limited to a certain number of years, and an 
endowment postponed the same number of years, namely 

A; 5 = (M,—Mr4n+ Dy4n)/De- (21) 

This may be transformed in a number of ways which we 
shall not stop to explain. 

We have, so far, assumed that the insurance would be paid 
at the end of the year of death; that is not an arrangement 
which usually commends itself in practice, most beneficiaries 
not caring to wait so long. 

Suppose that an annuity of $1 is to be paid in m equal 
instalments. Its value is increased, partly because the 
beneficiary receives his money earlier each year, partly because 
he receives some payments in the year of death. The present 
value of the last payment to be made each year is a,,/m. 

An annuity due of 1/m payable at the beginning of each 
year would have the value 


1 
5am (1 + ay). 


Let us assume that the intervening benefits decrease pro- 
portionately ; the total value of the annuity is now 


1 1 1 1 2 
ee 3 + — (de+ =) +0, 
Nv m Mm oT) Mm 

m— L 

2m 


GL Pe (22) 
Problem. 
Calculate the value of mj ,4, an insurance where the liability begins 
only after m years, of ;,4, where it ceases after m years, and of m nA, 
where it is limited both ways. 


SINGLE PAYMENT INSURANCE 201 


For a continuous annuity we should have 
Ay = Ag+. (23) 


In the same way, an insurance policy receives an enhanced 
value if the payment be made at the end of a stated fraction 
of the year, say 1/mth, in which death occurs, for the Company 
loses interest on dts money from the end of that term to the 
end of the year. Assuming, for simplicity, that the probability 
of death is the same throughout all intervals of the year, 
a rather inaccurate assumption,-and that the interest Jost is 
only simple interest, the loss to the Company is 


ih S24 Ee. m—1. 
2=| mM a itt =i = Bae 
hence the value under the present contract is 
Aime Qe) en 
For immediate payment at death, we have 
A, = A, (1 +72). (25) 


§ 4. Premiums. 


In all the calculations made so far we have merely con- 
sidered the present value of the benefit to be obtained. In 
the majority of cases, however, the beneficiary is by no means 
in a position to pay down at once the full value of his benefit, 
but arranges for payments at stated intervals. Suppose, for 
instance, instead of making a single payment for a simple 
life policy, the beneficiary wishes to make equal annual 
payments, beginning immediately, as long as he lives. What 
he undertakes to do is, thus, to pay to the Company an 
annuity due for the amount of the premium £, and this must 
have the same present value as the insurance, hence 


P,(1+dz) = Ag, 


cs (26) 


202 THE PRINCIPLES OF LIFE INSURANCE 


The premium for a policy payable immediately at death is 


n= Be (i+d) 


| 


When the beneficiary wishes to limit himself to at most 
nm payments, we have 
intz (1+ In, dy) = Az, 
M, 
N,,—-N 


xzt+n-1 


inde, = (27) 


The annual premium for an n-year endowment policy 
will be Po= Mz—Mein+Drsn, (28) 
Nz-1—Natn-1 


A not uncommon practice in the case of both insurance 
and endowments is to arrange that the premium or premiums 
shall all be returned with the benefit. It is not quite clear 
why any one should desire this type of policy, except that 
it has the appearance of giving the beneficiary something for 
nothing, which is always popular. Let us begin with the 
simplest case, and find the single premium for an endowment, 
which shall give the beneficiary, if alive, the sum of $1 plus 
the premium. We have 


fo (1 +7) Die 


Te = D > 
x 
yea Dan 
cies ae 


What will be the single premium for a simple life policy, 
premium to be returned with the insurance ? 


1+7,) M. 
Ti Caren ae 
— v : 
Ty, = po W, (30) 


If immediate payment be required, we must multiply M, 


Problem. 
Calculate the increased cost of Pyn| for immediate payment at death. 


PREMIUMS 203 


by 1+ = Single premium for n-year endowment policy, 
premium to be returned 


M,,—My4.,+ Dyan 


~=(I+ ; 

Wy ( Tx) Ds 

ae Mz—Mesnt+ Drs 

Ty. = Eee 31 
age ai= ye Mui) @)) 


Annual premium on simple life policy, all premiums to be 
returned. Here the payment side is an annuity due of the 
amount of the premium. ‘The benefit side is two policies, one 
for $1, the other of increasing amount, starting with the 
premium, and increasing by that amount every year. We 
get from (11), (18), and (20) 


Ns R, 
TaD DS: 
My 
7 Nai Se 


Let us look a little more closely into the wisdom of stipu- 
lating that the premiums shall be returned. Let us take this 
last case of simple life policy, premium to be returned. The 
beneficiary’s expectation is here 


7 l 
Lp, at Ce eae aa ee 
ly lx 

the ratio of benefit expected to premium is, by (32), 

Ne iy 

7) ) Maas 

The ratio of benefit expected to premium, when premiums 

are not returned, is N,-,/Mz- 
Problems. 


1. Find single premium for n-year endowment policy, with return of 
premiums, if payment be made immediately after death. 

2. Find premium for n-year endowment policy, all premiums to be 
returned, 


204 THE PRINCIPLES OF LIFE INSURANCE 


To compare these, we must compare R,/M, with e,. 
Turning to a 34 per cent. commutation table, we find the 
figures : 


a 


x R,/Mz Cy 
25 30 29 
40 22 20 
60 13 9 


As the ratio of expected benefit to premium is greater in 
the simple case than where premiums are returned, the former 
would seem to be the better for the beneficiary. 

At this point it is necessary to emphasize in the strongest 
terms the fact that the premiums which we have calculated, 
differ very widely from those charged in practice by com- 
mercial insurance companies. These net premiums fail to 
provide any reserve to meet the following contingencies : 

(1) Cost of operation, and interest on capital invested. 

(2) Fluctuations in the death-rate. 

(3) Fluctuation from the theoretical number of deaths, 
according to Bernoulli’s theorem. 

(4) Decrease in rate of interest obtainable on invested funds. 

In order to meet these various contingencies, the premiums 
are usually ‘loaded’ to a greater or less extent. The different 
companies do not announce to the world the different bases 
which they take for calculating this loading, and this reticence 
is very natural, but the result is a rather remarkable diversity 
in practice. As an example, we quote a few figures, the 
supposed age of the insured being 35 years: 


Net Premium, P.D.Q. Company. X.Y.Z. Company. 


20 payment life 0.0311 0-08328 0.08834 
20 year endowment 0-0422 0.0467 © 0.05147 


The net figures are calculated from a 33 per cent. com- 
mutation table, the others from a card published by the 
P. D. Q. Company showing how much less its rates were than 
those of some score of competitors. The X. Y. Z. was chosen 
for comparison because of its high premiums and great size. 
The great difference in the premiums is doubtless explained 


PREMIUMS 205 


in large measure by differences in systems of loading. Thus, 
some companies follow the plan of loading the first premiums 
very heavily, then dividing’ large slices of profit among the 
policy-holders. Insurance companies doing this have a habit 
of employing such adjectives as ‘mutual’ or ‘co-operative’ to 
describe themselves. We quote from the card where these 
figures are found : 

‘The P.D.Q. Company is distinguished for low rates of 
premium on all forms of insurance, also for low expense rate 
and its mortality, since organization is lower than that of 
any other American Company for a like period. All of its 
policies are on the “ participating plan”, that is, the difference 
between the premium and the cost of insurance is determined 
by experience, and returned to the policy-holder.’ 

The only insurance system with which the writer is familiar, 
where only net premiums seem to be charged, is the United 
States War Risk Insurance. 


§ 5. Surrender Values. 


At the moment when an individual takes out an insurance 
policy, his mathematical expectation is 0, that is to say, the 
sum which he expects to pay in net premiums has the same 
present value as the benefit looked for. As time goes on this 
equation ceases to hold. The expected benefit is greater than 
the expected outlay, and it would be increasingly advantageous 
to the Company for him to cancel the contract. The differ- 
ence between what the Company expects to receive from the 
premiums stipulated for in the past, and what it would 
expect from an individual of the same age, insuring himself 
for the first time, is called the ‘Surrender value’, and is about 
the sum which, in practice,a Company is willing to pay to 
a policy-holder, after the first few years, in return for giving 
up his insurance. It can be calculated in various ways. 

Suppose that an individual aged 2+n took out a simple 
life policy at the age a. The surrender value will be the 
difference between the value of a new policy for a man of his 


206 THE PRINCIPLES OF LIFE INSURANCE 


present age and the value of an annuity due of the amount 
of his present premium, namely, 
n Vy = Azin—(1 + dy 4n) Le 
— Mesn — Nesn-1 Me 
Dian Des aN on 


= Mia aot —M Ne sn=1 ' 
Dasn Noy 


(33) 


This method of calculating is called the ‘ prospective method’. 
It is interesting to compare it with the ‘retrospective method ’, 
which may be explained as follows. 

At the time when the contract was first made, the pros- 
pective value of the first ~ payments was (1+) ,_,@z) f. 
These payments-had two functions: to provide for a temporary 
insurance for 7 years, and to provide the surrender value at 
the end of that time. The difference between the limited 
annuity due and the temporary insurance is the surrender 
value, multiplied by the probability that the individual will 
survive ” years, and discounted for n years, i.e. an years 
endowment to the amount of the surrender value. We 
thus have 


y Desn — Ny — Netnat M, ptt (M,—Mz4,) 
on Dy Di, Ny, D, 
Ee M edhe ee PS 
Di Net 
Meath aay. 
V, = x+n*" 2-1 ROBERN - 34 
he Dain Nei, ( ) 


As a matter of fact, policy-holders do not usually surrender 
their policies, and, in consequence, a large insurance company 
is obliged to have continually on hand a very large reserve. 
This great sum of money gives to the Company much impor- 
tance in the world of finance. Moreover, there is rather a nice 
ethical question as to who is, in reality, the owner of this 
reserve, and this question is by no means of merely academic 
interest, for it was once raised in a big lawsuit involving one of 


SURRENDER VALUES 207 


the largest of the American companies. The policy-holders 
maintained that the reserve was really the totality of surrender 
values, and so belonged to them, or at least they should have 
a voice in determining how it should be managed. The 
directors of the Company contended that as long as the 
institution was in a sound financial condition, and of this 
there was never the slightest question, and as long as they 
met all of their obligations with reasonable promptness, it 
was nobody’s business but their own, what they did with the 
reserve. This line of reasoning would seem flawless, were 
it not for the allurement of mutuality or co-operation which 
many companies hold out to prospective policy-holders. Just 
how much right has a policy-holder in a mutual company to 
a voice in its management? Questions of this sort are in- 
teresting and important,’ but can hardly be said to fall 
naturally under the head of mathematical probability. 


208 


TABLES 


TABLE A 


a 


The Common Logarithms of e* and e-’. 


x 1ogyo &” 10gy9 0” @ 1ogyo &* logy e* 
0.00001 | 0.0000043429 | 1.9999956571 0.08000 | 0.0347435586| 1.9652564414 
0.00002 | 9.0000086859 | 1.9999913141 0.09000 | 0.0390865034|  1.9609134966 
0.00003 | 0.0900130288 | 1.9999869712 0.10000 | . 0.0434294482; 1.9565705518 
0.00004 | 0.0000173718 | 1.9999826282 0.2C000 | 0.0868588964| 1.9131411036 
0.00005 | 0.0000217747 | 1.9999782853 0.30000 | 0.1302883446| 1.8697116554 
0.00006 | 0.0000260577 | 1.9999739423 0.40000 | 0.1737177928| 1.8262822072 
0.00007 | 0.0000304006 | 1.9999695994 0.50000 | 0.2171472410/ 1.7828527590 
0.00008 | 0.0000347436 | 1.9999652564 0.60000 | 0.2605766891| 1.7394233109 
0.00009 | 0.0000390865 | 1.9999609135 0.70090 | 0.3040061373| 1.6959938627 
0.00010 | 0.0000434294 | 1.9999565706 0.80000 | 6.3474355855 | 1.6525644145 
0.09020 | 0.0000868589 | 1.9999131411 0.90000 | 0.3908650337| 1.6091349663 
0.00030 | 0.0001502883 | 1.9998697117 1.00000 | 0.4342944819| 1.5657055181 
0.00040 | 0.0001737178 | 1.9998262822 2.00000 | 0.86858 .638) 1.1314110362 
0.00050 | 0.0002171472 | 1.9997828528 3.00000 1,30288:+457 | 2.6971165543 
0.00060 | 0.0002605767 | 1.9997394233 4.00000 | 1.7371779276| 2.2628220724 
0.00070 | 0.0003040061 | 1.9996959939 5.00000 | 2.1714724095| 3.8285275905 
0.00080 | 0.0003474356 | 1.9996525644 6.00000 | 2.6057668914| 3.3942331086 
0.00090 | 0.0003908650 | 1.9996091350 7.00000 | 3.0400613733| 4.9599386267 
0.00100 | 0.0004342945 | 1.9995657055 8.00000 | 3.4743558552|  4.5256441448. 
0.00200 | 0.0008685890 | 1.9991314110 9.00000 3.9086503371 | 4.0913496629 
0.00300 | 0.0013028834 | 1.9986971166 f 10.00000} 4.3429448190} 5.6570551810 
0.00400 | 0.0017371779 | 1.9982628221 | 20.00000 | 8.6858896381| _9.3141103619 
0.00500 | 0.0021714724 | 1.9978285276 | 30.00000 | 13.0288344571| 14.9711655429 
0.00600 | 0.0026057669 | 1.9973942331 | 40.00000 | 17.3717792761 | 18.6282207239 
0.00700 | 0.0030400614 | 1.9969599386 } 50.00000 | 21.7147240952| 22.2852759048 
0.00800 | 0.0034743559 | 1.9965256441 f 60.00000 | 26.0576689142 | 27.9423310858 
0.00900 | 0.0039086503 | 1.9960913497 f 70.00000 | 30.4006137332 | 31.5993862668 
0.01000 | 0.0043429448 | 1.9956570552 f 80.00000 | 34.7435585523 | 35.2564414477 
0.02000 | 0.0086858896 | 1.9913141104 f 90.00000 | 39.0865033713 | 40.9134966287 
0.93000 | 0.0130288345 | 1.9869711655 § 100.00000 | 43.4294481903 |} 44.5705518097 
0.04000 | 0.0173717793 | 1.9826282207 } 200.00000 | 86.8588963807 | 87.1411036193 
0.05000 | 0.0217147241 | 1.9782852759 § 300.00000 | 130.2883445710 | 131.7116554290 
0.06000 | 0.0260576689 | 1.9739423311 J 400.00000 | 173.7177927613 | 174.2822072387 
0.07000 | 0.0304006137 | 1.9695993863 § 500.00000 | 217.1472409516 | 218.8527590484 

Note : log e*tY¥ = loge*+loge¥. Thus, log et!8-1478 — 49,139465180. 


TABLES 209 
TABLE B 
. The Probability Integral. 
tw) vy 
Gel e-* da.) 

Vt jo 
x 0 1 2 3 4 5 6 7 8 9 
0.00 | 0.00000 00113 00226 00339 00451 00564 00677 00790 00903 01016 
0.01 | 0.01128 01241 01354 01467 01580 01792 01805 01918 02031 02144 
0.02 | 0.02256 02369 02482 02595 02708 02820 02933 03046 03159 03271 
0.03 | 0.03384 03497 03610 03722 03835 03948 04060 04173 04286 04398 
0.04 | 0.04511 04624 04736 04849 04962 05074 05187 05299 05412 05525 
0.05 | 0.05637 05750 05862 05975 06087 06200 06312 06425 06537 06650 
0.06 | 9.06762 06875 06987 07099 07212 07324 07437 07549 07661 07773 
0.07 | 0.07886 07998 08110 08223 08335 08447 08559 08671 08784 08896 
0.08 | 0.09008 09120 09232 09344 09456 09568 09680 09792 09904 10016 
0.09 | 0.10128 10240 10352 10464 10576 10687 10799 10911 11023 11135 
0.10 | 0.11246 11358 11470 11581 11693 11805 11916 12028 12139 12251 
0.11 | 0.12362 12474 12585 12697 12808 12919 13031 13142 13253 13365 
0.12 | 0.13476 13587 13698 13809 13921 14032 14143 14254 14365 14476 
0.13 | 0.14587 14698 14809 14919 15030 15141 15252 15363 15473 15584 
0.14 | 0.15695 15805 15916 16027 16137 16248 16358 16468 16579 16689 
0.15 | 0.16800 16910 17020 17130 17241 17351 17461 17571 17681 17791 
0.16 | 0.17901 18¢-1 18121 18231 18341 18451 18560 18670 18780 18890 
0.17 | 0.18999 19109 19218 19328 19437 19547 19656 19766 19875 19984 
0.18 | 0.20094 20203 20312 20421 20530 20639 20748 20857 20966 21075 
0.19 | 0.21184 21293 21402 21510 21619 21728 21836 21945 22053 22162 
0.20 | 0.22270 22379 22487 22595 22704 22812 22920 23028 23136 23244 
0.21 | 0.23352 23460 23568 23676 23784 23891 23999 24107 24214 24322 
0.22 | 0.24430 24537 24643 24752 24859 24967 25074 25181 25288 25395 
0.23 | 0.25502 25609 25716 25823 25930 26037 26144 26250 26357 26463 
024 | 0.26570 26677 26783 26889 26996 27102 27208 27314 27421 27527 
0.25 | 0.27633 27739 27845 27950 28056 28162 28268 28373 28479 28584 
0.26 | 0.28690 28795 28901 29006 29111 29217 29322 29427 29532 29637 
0.27 | 0.29742 29847 29952 30056 30161 30266 30370 30475 30579 30684 
0.28 | 0.30788 30892 30997 31101 31205 31309 31413 31517 31621 31725 
0.29 | 0.31828 31922 32036 32139 32243 32346 32450 32553 32656 32760 
0.30 | 0.32863 32966 33069 33172 33275 33378 33480 33583 33686 33788 
0.31 | 0.33891 33993 34096 34198 34300 34403 34505 34607 34709 34811 
O32 | 0.34913 35014 35116 35218 35319 35421 35523 35624 35725 35827 
0.33 | 0.35928 36029 36130 36231 36332 36433 36534 36635 36735 36836 
0.34 | 0.36936 37037 37137 37238 37338 37438 37538 37638 37738 37838 
035 | 0.37938 38038 38138 38237 38337 38436 38536 38635 38735 38834 
0.36 | 0.38933 39032 39131 39230 39329 30428 39526 39625 39724 39822 
0.37 | 0.39921 40019 40117 40215 40314 40412 40510 40608 40705 40803 
0.38 | 0.40901 46999 41096 41194 41291 41388 41486 41583 41680 41777 
0.39 | 0.41874 41971 42068 42164 42261 42358 42454 42550 42647 42743 
0.40 | 0.42839 42935 43031 43127 43223 43319 43415 43510 43606 43701 
0.41 | 0.43797 43892 43988 44083 44178 44273 44368 44463 44557 44652 
0.42 | 0.44747 44841 44936 45030 45124 45219 45313 45407 45501 45595 
0.43 | 0.45689 45872 45876 45970 46063 46157 46250 46343 46436 46529 
0.44 | 0.46623 46715 46808 46901 46994 47086 47179 47271 47364 47456 
0.45 | 0.47548 47640 47732 47824 47916 48008 48100 48191 48283 48374 
0.46 | 0.48466 48557 48648 48739 48830 48921 49012 49103 49193 49284 
0.47 | 0.49375 49465 49555 49646 49736 49826 49916 50006 50096 50185 
0.48 | 0.50275 50365 50454 50543 50633 50722 50811 50900 50989 51078 
0.49 | 0.51167 51344 51433 51521 51609 51698 51786 51874 51962 


210 


0.87 
0.88 
0,89 


0.95 


0.99 


TABLES 


The Probability Integral. 


(=|, e-**da.) 


0.52050 
0.52924 
0.53790 
0.54646 
0.55494 
0.56332 
0.57162 
0.57982 
0.58792 
0.59594 
0.60386 
0.61168 
0.61941 
0.62705 
0.63459 
0.64203 
0.64938 
0.65663 
0.66378 
0.67084 
0.67780 
0.68467 
0.69146 
0.69810 
0.70468 
0.'71116 
0.71754 
0.72382 
0.73001 
0.73610 
0.74210 
0.74800 
0.75381 
0.75952 
0.76514 
0.77067 
0.77610 
0.78144 
0.78669 
0.79184 
0.79691 
0.80188 
0.80677 
0.81156 
0.81627 
0.82089 
0.82542 
0.82987 
0.83423 


0.83851 


TABLES 211 


The Probability Integral. 


( 7 Ib é “das ) 


bs 


DBahabaDa DE DUNN WHWUAHDODNDYDYNDYNYNNEH EH UHHH HOOOCOD . 


ODBDNAMTMAWNKFPOODNAGTARWNRFOUODNAAA 


0.84270 84312 84353 84394 84435 84477 84518 84559 84600 84640 
0.84681 84722 84762 84803 84843 84883 84924 84964 85004 85044 
0.85084 85124 85163 85203 85243 85282 85322 85361 85400 85439 
0.85478 85517 85556 85595 85634 85673 85711 85750 85788 85827 
0.85865 85903 85941 85979 86017 86055 86093 86131 86169 86206 
0.86244 86281 86318 86356 86393 86430 86467 86504 86541 86578 
0.86614 86651 86688 86724 86760 86797 86833 86869 86905 86941 
0.86977 87013 87049 87085 87120 87156 87191 87227 87262 87297 
0.87333 87368 87403 87438 87473 87507. 87542 87577 87611 87646 
0.87680 87715 87749 87783 87817 87851 87885 87919 87953 87987 
0.88021 88054 88088 88121 88155 88188 88221 88254 88287 88320 
0.88353 883586 88419 88452 88484 88517 88549 88582 88614 88647 
0.88679 88711 88743 88775 88807 88839 88871 88902 88934 88966 
0.88997 89029 89060 89091 89122 89154 89185 89216 89247 89277 
0.89308 89339 89370 89400 89431 89461 89492 89522 89552 89582 
0.89612 89642 89672 89702 89732 89762 89792 89821 89851 89880 
0.89910 89939 89968 89997 90027 90056 90085 90114 90142 90171 
0.90200 90229 90257 90286 90314 90343 90371 90399 90428 90456 
0.90484 90512 90540 90568 90595 90623 90651 90678 90706 90733 
0.90761 90788 90815 90843 90870 90897 90924 90951 90978 91005 
0.91031 91058 91085 91111 91138 91164 91191 91217 91243 91269 
9.91296 91322 91348 91374 91399 91425 91451 91477 91502 91528 
0.91553 91579 91604 91630 91655 91680 91705 91730 91755 91780 
0.91805 91830 91855 91879 91904 91929 91953 91978 92002 92026 
0.92051 92075 92099 92123 92147 92171 92195 92219 92243 92266 
0.92290 92314 92337 92361 92384 92408 92431 92454 92477 92500 
0.92524 92547 92570 92593 92615 92638 92661 92684 92706 92729 
0.92751 92774 92796 92819 92841 92863 92885 92907 92929 92951 
0.92973 92995 93017 93039 93061 93082 93104 93126 93147 93168 
0.93190 93211 95232 93254 93275 93296 93317 93338 93369 93380 
0.93401 93422 93442 93463 93484 93504 93525 93545 93566 93586 
0.93606 93627 93647 93667 93687 93707 93727 93747 93767 93787 
0.93807 93826 93846 93866 93885 93905 93924 93944 93963 93982 
0.94002 94021 94040 94059 94078 94097 94116 94135 94154 94173 
0.94191 94210 94229 94247 94266 94284 94303 94321. 94340 94358 
0.94376 94394 94413 94431 94449 94467 94485 94503 94521 94538 
0.94556 94574 94592 94609 94627 94644 94662 94679 94697 94714 
0.94731 94748 94766 94783 94800 94817 94834 94851 94868 94885 
0.94902 94918 94935 94952 94968 94985 95002 95018 95035 95051 
0.95067 95084 95100 95116 95132 95148 95165 95181 95197 95213 
0.95229 95244 95260 95276 95292 95307 95323 95339 95354 95370 
0.95385 95401 95416 95431 95447 95462 95477 95492 95507 95523 
0.95538 95553 95566 ©5582 95597 95612 95627 95642 95656 95671 
0.95686 95700 95715 95729 95744 95758 95773 95787 95801 95815 
0.95830 95844 95858 95872 95886 95900 95914 95928 95942 95956 
0.95970 95983 95997 96011 96024 96038 96051 96065 96078 96092 
0.96105 96119 96132 96145 96159 96172 96185 96198 96211 96224 
0.96237 96250 96263 96276 96289 96302 96315 96327 96340 96353 
0.96365 96378 96391 96403 96416 96428 96440 96453 96465 96478 
0.96490 96502 96514 96526 96539 96551 96563 96575 96587 96599 


212 TABLES 


The Probability Integral. 


(Gellerae) 


0.96611 96634 96658 96681 96705 
0.96728 96751 96774 96796 96819 
0.96841 96864 96886 96908 96930 
0.96952 96973 96995 97016 97037 
0.97059 97080 97100 97121 97142 
0.97162 97183 97203 97223 97243 
0.97263 97283 97302 97322 97341 
0.97360 97379 97398 97417 97436 
0.97455 97473 97492 97510 97528 
0.97546 97564 97582 97600 97617 
0.97635 97652 97670 97687 97704 
0.97721 97738 97754 97771 97787 
0.97804 97820 97836 97852 97868 
0.97884 97900 97916 97931 97947 


0.99532 99536 99540 99544 99548 
0.99552 99556 99560 99564 99568 
0.99572 99576 99580 99583 99587 
0.99591 99594 99598 99601 99605 
0.99609 99612 99616 99619 99622 
0.99626 99629 99633 99636 99639 
0.99642 99646 99649 99652 99655 
0.99658 99661 99664 99667 99670 
0.99673 99676 99679 99682 99685 
0.99688 99691 99694 99697 99699 
0.99702 99705 99707 99710 99713 
0.99715 99718 99721 99723 99726 
0.99728 99731 99733 99736 99738 
0.99741 99743 99745 99748 99750 


0.97962 97977 


97993 98008 


98023 


0.99753 


99755 99757 99759 99762 


0.98038 98052 98067 98082 98096 0.99764 99766 99768 99770 99773 
0.98110 98125 98139 98153 98167 0.99775 99777 99779 99781 99783 
0.98181 98195 98209 98222 98236 0.99785 99787 99789 99791 99793 
0.98249 98263 98276 98289 98302 0.99795 99797 99799 99801 99803 
0.98315 98328 98341 98354 98366 0.99805 99806 99808 99810 99812 
0.98379 98392 98404 98416 98429 0.99814 99815 99817 99819 99821 


0.98441 98453 
0.98506 98512 


98465 98477 
98524 98535 


98489 
98546 


0.99822 
0.99831 


99824 99826 99827 99829 
99832 99834 99836 99837 


6.98558 98569 98580 98591 98602 0.99839 99840 99842 99843 99845 
0.98613 98624 98635 98646 98657 0.99846 99848 99849 99851 99852 
0.98667 98678 98688 98699 98709 0.99854 99855 99857 99858 99859 
0.98719 98729 98739 98749 98759 0.99861 99862 99863 99865 99866 


0.98769 98779 
0.98817 98827 
0.98864 98873 
0.98909 98918 


98789 98798 
98836 98846 
98882 98891 
98927 98935 


98808 
98855 
98900 
98944 


0.99867 
0.99874 
0.99880 
0.99886 


99869 99870 99871 99873 
99875 99876 99877 99879 
99881 99882 99883 99885 
99887 99888 99889 99890 


. 0.98952 98961 98969 98978 98986 0.99891 99892 99893 99894 99896 
0.98994 99003 99011 99019 99027 0.99897 99898 99899 99900 99901 
0.99035 99043 99050 99058 99066 0.99902 99903 99904 99905 99906 
0.99074 99081 99089 99096 99104 0.99906 99907 99908 99909 99916 
0.99111 99118 99126 99133 99140 0.99911 99912 99913 99914 99915 
0.99147 99154 99161 99168 99175 0.99915 99916 99917 99918 99919 
0,99182 99189 99196 99202 99209 0.9992C 99920 99921 99922 99923 


0.99216 99222 
0.99248 99254 
0.99279 99285 


99229 99235 
99261 99267 
99291 99297 


99242 
99273 
99303 


GANFOUCVUDAADADARNNKFOUOUDANNDUTIABWNHOODAIADMAWNHOUDIAGANDNRHO 


0.99924 
0.99928 
0.99931 


99924 99925 99926 99927 
99928 99929 99930 99930 
99932 99933 99933 99934 


0.99309 99315 99321 99326 99332 0.99935 99935 99936 99937 99937 
0,99338 99343 99349 99355 99360 0.99938 99939 99939 99940 99940 
0.99366 99371 99376 99382 99387 0.99941 99942 99942 99943 99943 
0.99392 99397 99403 99408 99413 0.99944 99945 99945 99946 99946 


0.99418 99423 
0.99443 99447 
0.99466 99471 


99428 99433 
99452 99457 
99476 99480 


99438 
99462 
99485 


0.99947 
0.99950 
0.99952 


99947 99948 99949 99949 
99950 99951 99951 99952 
99953 99953 99954 99954 


0.99489 99494 99498 99502 99507 
0.99511 99515 99520 99524 99528 
0.99932 99536 99540 99544 99548 


0.99955 99955 99956 99956 99957 
0.99957 99958 99958 99958 99959 
0.99959 99960 99960 99961 99961 


Sle piste ie eam goa ce ie Sleds Se i ae ee ee ee ee ee ee 
DSPYDOOODOOOOODDDNDDADDADADYNIANNANAINATADANDAADAADAADAAIHAAIAAgAHHnon 
SCODIAADTVHSNNKFOODNADGTHWNRPOUODANADAGTHPWNRFOUOUDANADATANNRFOODANATAWNHHEO 


NNONYNNNYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 
PERRRERRRERUNUNHWHUUD PNR PNNNRE RHEE MEH OSOOS2So08 


ounmNnoan 


TABLES 213 


The Probability Integral. 


( zz. e-*dz.) 


a 0 1 2 3 4 5 6 7 8 9 


0.99959 99961 99963 99965 99967 99969 99971 99972 99974 99975 
0.99976 99978 93979 99980 99981 99982 99983 99984 99985 99986 
0.99987 99987 99988 99989 99989 99990 99991 99991 99992 99992 
0.99992 99993 99993 99994 99994 99994 99995 99995 99995 99996 
0.99996 99996 99996 99997 99997 99997 99997 99997 99997 99998 
0.99998 99998 99998 99998 99998 99998 99998 99998 99999 99999 


The value, J, of the Probability Integral may always be found from the convergent 
series 
2 a8 oe elt 
Gel ers ies 

but for large values of x, the semiconvergent series 

cf sah wees (\-sa4 LAs J Sry ) 

Ba) eG a (cart) eC 

is convenient. 


SUBJECT-INDEX 


Annuity, 196, 200, 201. 
Assumptions, empirical, 4, 6, 10. 


Banker, defined, 52. 


Cause, defined, 88. 

Chance, Banker’s, 52, 53, 55. 
Chance, games of, 52 ff. 

Chance, Player’s, 52, 53, 55. 
Chaos, molecular, 188. 

Coefficient, correlation, 145-9. 

— — defined, 146. 
Combinations, formulae for, 13, 14. 
Conditions, essential, defined, 5. 
Correlation, strong and weak, 147. 
Craps, 21, 51. 

Curve fitting, 168 ff. 


Deviation, standard, defined, 66. 

Discrepancy, average, 50. 

—, defined, 34. 

—, mean, 49, 

—, probable, 50. 

Dispersion, defined, 66. 

—, normal, 68. 

—, sub-normal, 69. 

—, super-normal, 69. 

—, theorem, 67. 

Distribution, normal, for velocities, 
179. 


Ellipse, error, 142, 

—, probable, 143. 

Endowment, 195. 

Hquations, normal, 153, 154, 157, 
159 


—, residual, 154, 

Errors, accidental, defined, 102. 
—, assumptions for, 103, 114, 116. 
—, average, 107, 119, 121, 123, 
—, constant, defined, 102. 

—, formulae, 115, 121, 1238. 

—, fundamental, 103, 104. 


Errors, Gaussian 
113 ff., 152. 

—, — — —, in many variables, 
130 ff. 

—, mean, 107, 108, 111. 

—, probable, 107, 119, 121, 123, 
162. 

—, residual, 109, 120, 121. 

livents, compound, 17. 

—, independent, 18. 

Expectation, defined, 25. 

— of life, 197. 


107, 


law for, 


Factorial, defined, 13. 
Fair, defined, 26. 
Favourable, defined, 26. 
Force, of interest, 195. 
—, of mortality, 192. 


Gas, assumed properties, 171, 172, 
175 


—, statistical theory, Chap. X. 
Graduation of mortality, statistics, 
192. 


Hyperspace, defined, 173. 


Inequality, Tchebycheff’s, 64, 67. 
Insurance, 199, 200, 201. 
Integral, probability, 48, 45. 

—, —, tables, 209-13. 


Law, Maxwell’s, 176 ff. 
—, Poisson's, 65. 
Likely, equally, 7-10. 


Mean, weighted, 106, 109, 131. 
Meidian, 1138. 

Mode, 1138. 

Moments, method of, 168. 
Monte Carlo, 56, 57, 58. 
Mortality, force of, 192. 


SUBJECT-INDEX 215 


Observations, conditional, 162. 
—, doubtful, 125. 


Paradox, Bertrand’s box, 90. 

—, —, geometrical, 75. 

—, Petrograd, 27, 28. 

Player, defined, 52. 

Poker, 21, 51. 

Precision, 118, 121, 123, 162. 

Premiums, 201, 202, 2038. 

Principle, Bayes’, 89, 100. 

—, —, for future events, 97. 

Probability, compound, 18, 22, 23, 
24 


—, defined, 1-5. 

—, of survival, 191, 192. 

—, total, 17, 24. 

eed Buffon’s needle, 80, 81, 
ee 


Ratio, correlation, 149, 150. 
Regression, line of, 148. 
Reserve, for insurance, 203. 
Risk, 30 ff. 

Roulette, 56, 57, 58. 


Ruin, chance of, Chap. III, § 4. 
—, defined, 52. 


Series, Bernoulli, 68, 71. 

—, Lexis, 69, 71. 

—, Poisson, 40, 70, 71. 

State, normal for gas, 179, 180. 
Sunrise, probability for, 99. 


Theorem, Bernoulli’s, 22, 37, 42, 
48. 

—, —, converse to, 93, 94. 

—, Duhamel’s, 43. 

—, fundamental dispersion, 67. 

—, fundamental form, games of 
chance, 55. 

—, Laplace’s, 45, 46. 

Turns, fair, favourable, and un- 
favourable, 26. 


Unfavourable, defined, 26. 
Weights, in direct measurements, 


109, 132. 
—, indirect measurements, 159 ff. 


INDEX OF 


Airy, 127. 
Author, 52, 66, 130. 


Barbieri, 81. 

Bayes, 89. 

Beetle, 104. 

Bernoulli, Daniel, 28. 

Bernoulli, James (Jacob), 1, 7, 82, 
68, 71. 

Bertrand, 28, 50, 58, 65, 75, 90, 92, 
95, 115, 125, 148, 180, 194. 

Borel, 175. 

Bravais, 147, 

Brown, 51. 

Buffon, 80. 


Castelnuovo, 45, 72, 100, 171, 187. 

Cournot, 7. 

Czuber, 28, 42, 50, 78, 89, 99, 189, 
192, 194. 


Eldredge, 123. 


Fechner, 113. 
Fermat, 20. 
Fisher, 70, 170. 
Forsyth, 70. 
Fox. 82. 


Gauss, 117. 
Glaisher, 128. 
Gompertz, 193. 
Greiner, 138. 
Grinwald, 50. 


Hagen, 118. 
Hoar, 21, 51. 
Jeans, 171, 179, 187, 188. 


Keynes, 2, 5, 8, 37. 
King, 189, 196. 
Kipling, 100. 

von Krieg, 8, 9. 


AUTHORS 


Laplace, Ua AD. 
Lazzerini, 81. 
Lexis, 69, 71. 


Makeham, 192. 
Marbe, 50. 
Markhoff, 21, 45. 
Maxim, 56. 
Maxwell, 171, 176, 179. 
de Mere, 20, 29. 
Mill, 3, 5, 6. 
Miller, 123. 

von Mises, 5, 130. 
de Moivre, 53. 

de Montmort, 24. 


Osgood, 43, 45. 


Pascal, 20, 29. 

Pearson, 50, 1388, 168, 170. 
Peirce, 126. 

Poincaré, 76, 115, 117. 


Schiaparelli, 104. 
Schimmack, 104. 
Simmons, 47. 
Stewart, 127. 
Stirling, 41. 


' Stone, 104, 127. 


Stumpf, 8. 


Tannery, 22. 
Tchebycheff, 21, 64, 122. 


Venn, 5, 6. 


Winlock, 127. 
Wolff, 52. 


Yule, 147. 


519 lain Say aa 


COOCLEIDGE J\_t .AN 


INSERT BOOK 
“MASTER CARD 
FACE UP IN 
FRONT SLOT 
OF S.R. PUINCH™ 


MASTER CARD 


GLOBE 901144-0 _ 


INTRODUCTION TO MATHE 


UNIVERSITY OF ARIZ 
LIBRARY 


~_Y__—_—_— 


SS 
\e 


eae 
Piast | 


wie sees 


bate y 


Pir 


see Teteh 
ear ee rbe 
aft 
ae 
ciate? 
5 


a, 
= Ti 
pan tee 
att 
einat 

: ay We 

ih A i oe 
‘+ abe = grt hi. 


sb eg 
set 


atu eG ee Gn 
a) pitetatatess 
Se oleae 


seh 


HEN 
Ls eot) 


a 
“f Sekt nidtet etn s 
of : * , ee face evens 2 x Meee 
See, Cee ee Moats ee '* Oe: sept rete atates 
aise * ; 2 LA Wa Fe, 
Cer aie peiateteTacstareetetslitat] iets eiatetyints 
ee a ees s 
* 


